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Iheon-tic.il model* amt c*|H.-nmental results relevant to the study of behavioral issues in 
, use of text ed, lots-, ..eluding both those intended primarily for computer program 
development and those intended for manuscript preparation-are examined. Models can 
predict editing task tune in terms of elementary activities, in an error-tree environment 
o an accuracy comparable to the variability between subjects. In a realistic selling 
towever, unpredictable user activities account for between 25 and 50 percent of the task 
time, at. amount that is comparable to individual variations due to errors. Variations in 
computer response time appear to affect users more than mere delay does. Command 
options improve expert performance but degrade the performance of beginners. The 
surface syntax of an editor can have considerable impact on ease of use. Ergonomic 
aspects of keyboard and display terminal design and use are well understood, with little 
hope for Significant improvement, but there is no experimental evidence to support 
guidelines for display format design. Among analog pointing devices the mouse appears to 
huve a small edge over the light (ten, joystick, and track ball; human pointing 
performance using these devices approaches known psychophysical limits. Optimum 
ambient conditions, including temperature, noise, work-station layout, illumination, and 
work rest cycles derived for professional key entry operators and for other interactive 
tasks are probably also valid for editing. Gaps in the application of cognitive psychology 
and human engtneermg to text editors in the literature are indicated, and promising ' 
research areas are delineated. 
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INTRODUCTION 

The objective of this paper is to give an 
account of the methods in use for studving 
the behavioral aspects of text editors and 
to present the results obtained. In the in- 
troduction we briefly describe the various 
functions of computer editors, discuss di- 
verse means of studying them, and provide 
pointers to applicable areas of psychology. 
It is assumed that the reader is familiar 
with the basic vocabulary of computer sci- 


ence, has had sufficient exposure to various 
text and program editors to have built up 
firm opinions regarding them, and is inno- 
cent of any formal training in psychology. 

For background reading in the more com- 
prehensive area of interactive systems, we 
recommend the Infotech report on “Man/ 
Computer Communications” (with a bibli- 
ography of 225 titles) [Info79], Martin’s 
popular Design of Man/ Computer Dia- 
logues [Mart73], and the International 
Journal of Man-Machine Studies. 
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Interactive Text Editors 

Interactive text editors allow the manipu- 
lation of a set of files stored on the host 
machine by means of a terminal device such 
as a teletypewriter or a display-keyboard 
combination. The files may contain natural 
language text, computer programs, or al- 
phanumeric data. Because editors fre- 
quently constitute the primary means of 
interaction of a person with the computer, 
they tend to subsume all kinds of secondary 
functions as well. 

In a naiTow sense one may consider the 
editing process as a transformation from an 
existing string of symbols known as the 
source file (which, in the case of initial text 
entry, may be null) to a new string of sym- 
bols known as the target file [Oren74, 
Heck78, Anan80]. This definition must be 
stretched to accommodate hierarchical ed- 
itors, which also contain information about 
the semantic structure of the files; format- 
ting editors, which change margins, justify 
text, center headers, and provide other 
typesetting commands; and language-de- 
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pendent editors, which accommodate the 
syntactic rules ol a programming language 
(or, eventually, even of natural language!). 
An even broader interpretation is necessary 
to include file organization, message sys- 
tems, and access to utility programs. Edi- 
tors often also provide status information, 
such as the number of current users or the 
time of the day, and acce.- ; to the output of 
compilers, interpret, rs, ami other programs 
submitted for execution. 

A thorough survey of interactive editors 
appears in VanD71, with more recent con- 
tributions referenced in Rido7(> and 
Reim78. Widely known examples of gen- 
eral-purpose interactive editors are QED, 
CMS, TECO, Wylbur, WIDJET, and 
UNIX [Deut(17, IBMTtJ, Teco()9, Stan 75, 
IBM78, Kern 79], Unlike general-purpose 
editors, language-dependent editors are re- 
stricted to program modification in some 
specified programming language. There- 
fore, they can incorporate syntax valida- 
tion, diagnostic messages, and error-cor- 
recting functions normally reserved for 
compilers and interpreters [HANs71a, 
TeiT79], Many conversational languages 
such as BASIC, APL, and LISP include an 
editor that is actually considered part of 
the language environment [Iver62, 
Bing76, Keme71, Sand78], 

Interactive editors have been imple- 
mented on large time-shared computer sys- 
tems, on dedicated microcomputers, and on 
word-processing systems designed primar- 
ily for office applications [Carl78], On 
some document preparation systems the 
editor itself is interactive, but many of the 
formatting commands take effect only at 
the time the final printed copy is generated 
[Ossa77], While all of these configurations 
fall within the purview of our survey, batch 
editors such as PANVALET [Pans77], 
which are designed to log all changes and 
to preserve back-up versions of programs in 
the maintenance of large software systems, 
and key data entry systems for massive 
data-processing operations [Gilb77], do 
not. 

To summarize, the functions of interac- 
tive editors of primary interest to us are (1) 
the creation, modification, and execution of 
computer programs; (2) the preparation of 
documents for human use; and (3) the ex- 


lich 

amination and retrieval of portions of pro- 
gram, text, and data files. 

Editor Design and Evaluation 

In discussing editors firm opinions abound: 
everyone, from greenhorn to old hand, 
knows exactly what the best and worst fea- 
tures of given editors are and just how new 
editors ought to he designed. The only 
problem is the striking lack of consensus. 
Nor are there universally acceptable means 
of determining who is right: the distinction 
between conventional dogma and scientific 
fact is often blurred. There are, neverthe- 
less, some established means of studying 
editors and the editing process. In the next 
few paragraphs we sketch the available 
sources of knowledge, starling with the 
most subjective. 

Introspection, our own intuition and ex- 
perience, is what we depend on when we 
assume that we know as much about the 
topic as the next person and are too lazy to 
look further. It is surprising how many pro- 
grams intended for use by others, including 
text editors, appear to be based on this slim 
foundation, if we can judge by the lack of 
references to previous work in the publica- 
tions describing them. 

Field studies or field observations collect 
information in situ without substantial in- 
terference with the system or phenomenon 
under study. They range in scope from an- 
ecdotal evidence, which extends one’s own 
experience to the isolated and fortuitous 
observation of others, to carefully recorded 
systematic observations carried out with 
deliberate planning of the features to be 
noted. 

Formal analysis draws conclusions from 
a theoretical (for instance, syntactic or 
probabilistic) model of the editing system 
under study. Such analyses may suggest 
behavioral parameters for experimentation 
but have proved to be of limited value to 
date. 

Controlled experiments restrict the num- 
ber of variables to be manipulated and ob- 
served and attempt to minimize the effects 
of all other factors. The observed (or “de- 
pendent”) variables are recorded as the 
controlled (“independent”) variables are 
either held constant or driven, singly or 
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jointly, through a predetermined range <>J 
interest. 

Psychological models characterize hu- 
man performance and add the important 
element of prediction. The principal goal of 
these models is to predict human hehuviot 
in a restricted environment while perform- 
ing a set of tasks. Purely descriptive ab- 
stractions, on the other hand, cannot In- 
extended to situations other than those loi 
which .ill parameters are already known. 
Since behavioral studies that focus specif] 
tally on text editor usage are relatively 
scarce, we also draw on relevant studies 
performed in other contexts. 

Applicable Areas of Psychology 

The publication in 1971 of Gerald Wein- 
berg’s influential The Psychology of Com- 
puter Programming may be considered its 
marking the beginning of the behavioral 
approach to computer science [Wkin 71], 
As the continuing increase in the cost-effec- 
tiveness of computer equipment exposes 
more and more people without specialized 
training (and without the tolerance that 
reflects such training) to computers, many 
concepts and methods that are second na- 
ture to psychologists will find increasing 
application to the study of the human ele- 
ment [MiLL77a, ShneSO], 

As computer scientists interested in the 
study of text editors, we need not, fortu- 
nately, concern ourselves with all aspects of 
psychology. The area of psychology that 
appears most relevant to our needs is cog- 
nitive psychology, the study of higher men- 
tal processes such as memory, perception, 
learning, thinking, reasoning, language, and 
understanding. And as important as the 
topics themselves is the commitment to the 
observational view of science rather than a 
literary, intuitive, or humanistic point of 
view. An excellent introduction to the par- 
adigms of modern cognitive psychology and 
a summary of the principal findings are 
presented in Lach79. 

Bridging conventional pscyhology and 
the hard-science disciplines is the field of 
human factors engineering, or ergonomics, 
which burgeoned during World War II to 
deal with wartime problems in the area of 
skilled performance [Welf76], Many inter- 
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e.sting examples of human-fact urs-oriented 
design can be found in Drey:> 5. To quote 
from Lachman, the most important ideas 
borrowed by current research in informa- 
tion-processing psychology from human 
factors engineering are 

(1) the view of man as an information 
transmitter and decision maker, 

(2) the idea that there are limits to how 
much information he can transmit, 

(3) the theory of signal detectability, 

(4) continuing access to the concepts of the 
physical sciences, 

(5) a reliance on sophisticated instrumen- 
tation, and 

(6) a taste for federal funding. 

While cognitive psychology abounds in 
elegant and sophisticated studies of mental 
processes, and many important facts have 
been established beyond dispute, its tools 
have been applied to the study of text edi- 
tors only in rudimentary fashion. The 
weight given to various components of this 
survey is dictated by what we found in the 
literature. 

Section 1 presents the development of 
temporal models, in which the duration of 
an editing task is predicted by analysis of 
its constituent components (e.g., command 
entry). The behavioral effects of the dura- 
tion and variability of computer response 
times are also discussed. 

The study of the impact of editor struc- 
ture and command languages on behavior 
patterns is examined in Section 2. We re- 
view attempts to develop methods for dif- 
ferentiating among the characteristics of 
popular present-day editors ostensibly de- 
signed for similar applications. 

Among the major achievements of cog- 
nitive psychology has been the develop- 
ment of a series of models for the accurate 
prediction of reaction times as a function of 
stimulus and response modality, number, 
complexity, and similarity. Reaction time is 
one of the favorite tools of experimental 
psychology. Simple reaction time (one 
stimulus, one response) is the time it takes 
to press a button when a light goes on 
(about 180 ms). Complex tasks, which in- 
clude many stimuli and many responses 
(such as typing), require consideration of 
stimulus categorization and response se- 


lection in addition to simple reaction time. 
I he discussion ol key entry and pointing 
skills in Section 3 presents several attempts 
to extend classical reaction-time studies to 
more complex tasks. 

The final section reflects our own opin- 
ions, based on this study, of where the most 
success has been achieved in applying cog- 
nitive psychology to text-editor design and 
evaluation, where the work appears to be 
headed for success, and where much greater 
effort and ingenuity are required. 

1. PERFORMANCE TIME CONSIDERATIONS 

An objective of the design and evaluation 
of text editors is to minimize the cost in- 
curred by a user performing a number of 
editing tasks over a period of time. Ulti- 
mately, this cost is a function of the time 
taken by the user and the computer to 
complete each individual task. One may 
conjecture that task time depends on the 
nature of the task, the expertise of the user, 
the sluggishness of the machine response, 
and the time spent learning and relearning 
methods and procedures. Many factors, in- 
cluding the user’s alertness and motivation, 
the availability of documentation and help, 
the editor’s command structure, the ease of 
committing and correcting errors, and 
whether or not a hard-copy device runs out 
of paper, may also influence task time. 
Some of these time factors tire beyond the 
control of the designer, but others can be 
evaluated and improved. 

1 .1 Simple Models for Predicting Task Time 

Direct measurements of elapsed editing- 
session time are difficult to interpret be- 
cause several not so easily controlled fac- 
tors are introduced: the choice of editing 
commands, user alertness and motivation, 
and errors— both minor and disastrous. 

Predictive models avoid these difficulties 
and further have the advantage of being 
useful at design time [Card 76, CARD78a, 
CARD80b, Embl 78]. These models are 
based on quantities such as keystroke 
count, typing rate, computer response time, 
and mental preparation time. The predic- 
tive power of these models depends on how 
accurately the constiluent quantities can 
be estimated and the validity of any sim- 
plifying assumptions. 
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Card et al. proiio.se a model that predicts 
task time for expert users performing rou- 
tine tasks by simply accumulating the time 
required to perform individual unit tasks 
[CakdHOIi]. 

I lick = \ 7’, nm | .. |. . 

all units (a: ks 

A large editing task, such as making nu- 
merous changes, additions, and deletions in 
a document, is considered to be a series of 
small, cognitively manageable, quasi-inde- 
pendent subtasks, called unit tasks. A unit 
task may correspond to a single command 
of an editor, or it may correspond to a short 
sequence of related commands to perform 
an action such as moving a block of code. It 
may also be a lengthy task such as typing 
in a document from a manuscript. Essen- 
tially, a unit task is a subtask for which a 
user has an available method and is describ- 
able in a simple phrase or two. 

Unit task time depends on how long it 
takes a user to acquire a mental represen- 
tation of the task and then perform or 
execute it, 


7’u nit t.,„k 


The execution time is further refined as 


Toxwu,e = Tk + T p + 7j, + T, + T m 


where 


Tk = n k tk is the keying time and depends 
on the number of keystrokes, n k , 
in the unit task and on the typ- 
ing rate, 1/4, 4 = 0.2- 1.0 second 
[Seib72], 

T p = n,,t p is the pointing time and de- 
pends on the number of times 
the user points at something on 
the screen, n v , and on the esti- 
mated time it takes to reference 
a screen position with a cursor 
under control of a “mouse,” t v 
= 1.10 seconds [Engl67, 

CARD78b], 

Th = n h th is the homing time and depends 
on the number of times the 
user’s hands move from one de- 
vice to another, n h , and on the 
estimated time for hand move- 
ment between any two devices, 
h = 0.40 second [Card 76, 
CARD78b]. 


7', is the response tune , which is 

command dependent, and is 
considered only if the user has 
to wait. 

7 ’ , , — is the mental time and depends 

on the number of decisions that 
must be made, n„„ and on the 
estimated time to make a 
decision, t,„ = 1.3f> seconds 
[Cakd801>]. 

With the exception of mental time, it is 
easy to count the instances of each of these 
elemental actions and to accumulate the 
times, given the details of the methods used 
to accomplish the unit tasks. Mental time 
represents the time the user takes to pre- 
pare for executing a physical action. It is 
assumed to be constant and contributes to 
the task time whenever a decision must be 
made, except when the decision can be 
overlapped with such independent compo- 
nents as computer response time. Heuristic 
rules that specify how to count the contri- 
bution of mental time during a unit editing 
task are established and, in essence, assert 
that unless the next operation is anticipated 
by a previous one, a mental operation oc- 
curs [CARD80b]. 

As an example of how T , is calcu- 
lated, consider the task of replacing one 
word of arbitrary length with a five-letter 
word. The task can be performed as shown 
in Figure 1 using DISPED (an experimental 
display-based system at the Xerox Palo 
Alto Research Center). Assuming the user 
is an average skilled typist, the time per 
keystroke, 4, is 0.20 second, and the pre- 
dicted time to execute the unit task is 0.2 
seconds. 

An experiment to determine how accu- 
rately this keystroke model predicts per- 
formance times was conducted. Twelve 
subjects performed ten versions of four dif- 
ferent editing tasks on each of three differ- 
ent editors— POET (a dialect of the QED 
Editor) [Deut67], SOS [Savi69J, and 
DISPED — for a total of 480 observations. 
To avoid transfer effects, no subject was 
observed on more than one editor. All sub- 
jects were experts on the system they used, 
and all of the tasks performed were routine, 
ranging from a simple word substitution to 
the more difficult task of moving a sen- 
tence. Methods for accomplishing the four 
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Initially, the user's hands are on the keyboard 

1. Reach for mouse 

2. Point to word on screen 

3. Press button to “select” word 

4. Home hands on keyboard 

5. Execute replace command (type “It”) 

6. Type new five-character word 

7. Terminate entry 


Kkiuhe 1. Act ion sequence for i ..-placing a word of arbitrary length with , 
letter word using the DISPED editor. S 

1ABL k 1. Calculate !! anu Observed Execution Times" 

Calculated 


Observed 


/it, X ty + n„ X 1.10 + n„ X 0.40 + T, + n,„ X 1.35 = 


1 POET 15 0.23 

SOS 19 0.22 

DISPED 8 0.23 1 2 

2 POET 14 0.28 

SOS 18 0.2 : 

DISPED 4 0.2 1 1 2 

3 POET 12 0.19 

SOS 7 0.23 

DISPED 2 0.23 1 1 

4 POET 92 0.19 

SOS 47 0.23 

DISPED 6 n.24 3 1 

“Adapted from Card80I). 

different editing tasks were prescribed. 
Each experimental session lasted approxi- 
mately 40 minutes and consisted of a prac- 
tice session and a test session in which 
subjects performed editing tasks marked on 
manuscript pages in red ink. 

Results are shown in Table 1. Tasks on 
which there were significant errors or in 
which the user did not use the prescribed 
method were excluded from consideration. 
As can be seen, predicted execution time 
matched the mean observed time reasona- 
bly well for most tasks but varied widely in 
a few instances. It must be remembered, 
however, that this is for a single unit task; 
because of the law of large numbers, pre- 
dicted time for a succession of unit tasks 
should be more accurate. 

In order to predict the total task time, 
task acquisition time was also estimated: 
1.8 seconds to look at the manuscript only 
and 4.0 seconds to look at both the manu- 
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I' 

Prediction 


(mean) 

Error 

(%) 

8.8 

7.8 

11 

9.6 

9.(1 

1 

6.4 

5.7 

11 

9.4 

8.9 

5 

9.5 

9.7 

-3 

5.6 

4.1 

26 

(i i 

(1.3 

0 

•1 .1 

4.0 

8 

:i 1 

3.5 

-7 

3. . 3 

37.1 

-6 

2t, .8 

32.7 

-22 

11.6 

14.3 

-23 


script and screen. With this estimate of 
acquisition times, the prediction of task 
times by the keystroke model was accurate 
to within 5 percent. 

Questions naturally arise as to whether 
simplified versions of this keystroke model 
might predict session time equally well or 
whether an even more detailed analysis 
would yield better results. Card and his 
colleagues analyzed several simplifications 
but found none to be as accurate 
[CARD80b], 

A similar model for line-oriented editors 
had been proposed earlier by Enibley and 
his colleagues, who describe a keystroke 
model in which acquisition time and mental 
time are considered as a single parameter 
and in which unit tasks are single com- 
mand-response pairs [Embl 78], Specifi- 
cally, 

Ttask - mT c 4- n 7 , 
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Normalized Vast Time Difference 1 ' 
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detain Entry' 

A in ice 

Inter mediate 
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Mudifi ■ Miuiifi- 

Mudifi Modift 



eatmn 1 eutiun 2 

1 *2 


1 10 

-27 -9 

18 19 


2 19 

-34 —8 



3 1-1 

11 -21 

35 18 


4 25 

-18 -19 

33 


17 

- 19.4 

26. t 

Mean 




Confidence interval* 1 
Conclusion 

±8.5 

CMS better 

±10.3 

NUROS better 

2.9 0 

CMS better 

“Adapted from Emhi. 78. 
b Normalized task time dilferen 

ce is computed by 




NUROS task time - CMS task time 

NUROS task time * UKI 

|t The initial entry is identical for both Novice and Intermediate command sets. 

I he confidence interval of the mean is calculated with Student’s t-test at 95 percent certainty 


where m is the number of command-re- 
sponse pairs, 7\ is the delay per command 
consisting of the menial preparation time 
and the computer response time, n is the 
number of keystrokes, and 1\ is time per 
keystroke. The quantities m and n depend 
only on the editing task to be performed 
and on the available command language; 
thus if m and n can be measured with 
reasonable accuracy, then the duration of 
task time can be predicted for various val- 
ues of command delay time and typing rate. 

An objective of this study was to inves- 
tigate a procedure for comparing program 
editor performance as a function of the time 
required for a user to perform editing tasks. 
By way of example, the model was applied 
to command subsets of NUROS [UNCN79] 
and CMS [IBM76] suitable for novice and 
intermediate programmers. Twelve editing 
tasks were defined by arbitrarily selecting 
three versions of four student programming 
projects— an initial version, an intermedi- 
ate version, and a final version. Four sets of 
commands were established: the NUROS 
novice and intermediate sets and the CMS 
novice and intermediate sets. With these 
command sets an experimenter (an expert 
user of both CMS and NUROS) studied 
the editing tasks and then performed them. 
Command-response pairs (m) and key- 


strokes (n) were counted, and the mod. 
was applied with parameters 7] = 5secom. 
and T k = 0.5 second. The results are show 
in Table 2. 

Actual task time was not an objective i 
this study, so empirical data for compariso 
with the keystroke model of Card and cu, 
leagues are not available. Since the ke\ 
stroke model makes a finer distinction < 
the delay per command, however, it shoul 
be a better predictor. 

1.2 More Comprehensive Models 

Models with very detailed mental-time oj 
erators have also been proposed. Tie. 
[Treu 75] presents a model for the ment;. 
work involved in text-editing tasks that i 
based on action primitives and their relh 
tionship to system commands, but report 
no experimental data to verify his hypoth 
esis. An experimental study where the que.s 
tion of level of detail is addressed was pei 
formed, however, by Card et al. in an inves 
tigation of their GOMS model of text edit 
ing [Card 76, CARD80a], 

The GOMS models describe a user’s per 
formance in terms of goals, operators 
methods for achieving the goals, and selec 
tion rules for choosing among competing 
methods (Figure 2). It is a theory that at 
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GOMS 

Constituents Examples 


Goals 


Operators 


Methods 


Selection rules 


Edit manuscript 
Edit unit task 
Locate line 
Modify text 

Use substitute command 
Look at manuscript 
Verify edit 

For locating a line alternative methods are 
string search 
move forward/backward 
combinations of the above two methods 
For changing a line some alternatives are 
delete-insert 
change 

For locating a line selection rules might be 
Rule 1: Use string search as default 
Rule 2: Use forward/backward if the distance is known 
For changing a line selection rules might be 
Rule 1: Use change command as default 
Rule 2: Use delete-insert if it takes fewer keystrokes 


Figuhe 2. Sample goals, operators, methods, and selection rules for a GOMS 
model. 


tempts to explain how an expert user ac- 
complishes routine editing tasks and thus 
models more than just the various time 
constituents that comprise task completion 
time. 

A feature of the GOMS model is its abil- 
ity to adjust to either more or less detail. 
The substitute command, for instance, can 
be expressed as a unit or, in more detail, as 
“specify substitute command-specify argu- 
ment number 1-specify argument number 
2-enter command.” The level of detail is 
called the grain of analysis. As the grain 
becomes finer, the models expose lower 
level operations and render them suscepti- 
ble to measurement; thus one might argue 
that these fine-grained models are poten- 
tially more accurate. It is known, however, 
that the times required for operators in a 
sequence may be interdependent, especially 
in a fine-grain analysis [Abru56]. Further- 
more, measurements of individual fine- 
grain operators are typically less accurate 
than measurements of coarse-grain opera- 
tors. 

Since it is difficult to know what grain 
size to use, several variations of the GOMS 
model were explored. The models ranged 


from “very coarse,” where a single operator 
of constant duration represents each unit 
task, to “quite fine,” where operators such 
as “home hands on keyboard” or “type an 
s” are of less than half-second duration. 
Within this time stratification models on 
the same level differed by the degree that 
alternative operators (or sequences of op- 
erators) were considered. For example, one 
model with 4-second operations used single 
operators for each functional step (“locate 
line,” “modify text”) whereas another 
model at the same level considered the 
same functional operators but divided them 
into separate cases on the basis of the meth- 
ods used to accomplish them (“string 
search” or “move forward/backward” 
rather than “locate line”). 

In order to test the GOMS model and to 
see the effects of the grain of analysis, an 
experiment was conducted using ten differ- 
ent GOMS models: one model at the level 
of about 16-second operator duration, two 
at about 4-second operator duration, four 
at about 2-second duration, and three at 
about 0.5-second duration. Although five 
subjects participated, the volume and detail 
of data permitted an intensive analysis of 



Leosl Detailed Models Most Detoiled Models 

(~ 16 - s operator duration) (~0.5-s operolor duration) 

Figure 3. Accuracy of GOMS models (adapted from CAKi>78a). x, reproduction of derivation data; 

O, prediction of cross-validation data. 

only one subject. The data from this subject 
were partitioned into two sets, a derivation 
data set from which prediction rules for 
operator sequences and estimates for op- 
erator duration were derived and a cross- 
validation data set which was preserved for 
calculation of unit task duration using pre- 
dicted operator sequences and durations. 

To obtain an upper bound on the models’ 
predictive power, the duration of unit tasks 
in the derivation data were also calculated 
using the actual operator sequences. 

The results are shown in Figure 3. The 
root-mean-square difference between a set 
of predicted and observed unit task times 
is expressed as a percentage of the average 
observed unit task time. The accuracy of 
the reproduction of the derivation data im- 


proves from just under 40 percent when th 
average unit task time is the predictor 1 
about 20 percent for the most detaih- 
model. The main result, however, is tha 
the predictions based on the cross-valid:, 
tion data are all about equally accural i 
with a root-mean-square error of_ about 3 
percent of the mean observed task timi 
Grain size of the model appeal's to be < 
little consequence unless the sequence < 
operators can be predicted nearly perfectly 
Although an error of 30 percent seem 
high, predicting editing times unit task b 
unit task for a single user is a very stringen 
test. If task time rather than unit task tint 
were predicted using a predictor statist: 
cally adjusted to be unbiased, then the pei 
cent prediction error would drop approx: 
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TAHLE 3. Selection Rules fok the Loc.uk Coal" 

Percent of 
( 'uses 

User Rule Predicted 

( ' urrertly 
(Cunndu- 

tie e) 


si (tty) 1. Search by specifying a string in the desired line unless another rule applies (if) 

2. If the number of lines to the next text to be modified is less than three, 78 

move forward/backward one line at a time 

3. If the desired line is the last line of a page, search by speed ving the end of 85 

page marker (a $) 

S2 (tty) 1. Move forward/backward by specifying number of lines to move unless 77 

another rule applies 

2. If the number of lines to the next text to be modified is fewer than three, 94 
move forward/backward one line at a time 

5 2 i CRT) 1. Move forward/backward one line at a time unless another rule applies (18 

2. If the number of lines to the next text to be modified is greater than nine, 74 

move forward/backward by specifying the number of lines to move 

3. If the desired line is on the next page of the manuscript, move forward one 85 

line at a time 

53 (CRT) 1. Search by specifying a string in the desired line unless another rule applies 62 

2, If the number of lines to the next text to be modified is fewer than five, 92 
move forward/backward one line at a time 

Average predicted correctly by use of all rules for each subject 89 

* Adapted from CakdBQu. 


mately as the square root of the number of A summary of the results is shown in 
unit tasks. For an editing session consisting Table 3. Each subject appeared to have a 
of about as many (73) unit tasks as were dominant method — the rule listed first. Ap- 
marked on the manuscript, the prediction patently users apply the dominant method 
models would be accurate to within 3-4 unless it is obviously inefficient. Note that 
percent. S2 applied one dominant method while us- 

Besides the investigation of time predic- ing the teletypewriter and another dona- 
tion and analysis of grain size, Card and his nant method while using the CRT, presum- 
colleagues also investigated how accurately ably because of the speed difference be- 
the GOMS model might predict the meth- tween devices. The selection of methods 
ods a user would select to accomplish a also depends on the features of the task, 
task. The objective was to determine For locating a line the most important char- 
whether a set of simple selection rules could acteristic of the task is the number of lines 
account for the methods users select. between the current line and the line with 

An experiment was conducted in which the text to be next modified, 
the method selection to locate a line with Observing (1) that a high percentage of 
the POET editor was observed. Four exper- the methods selected can be accounted for 
iments were run in which subjects were by a few simple rules and (2) that expert 
given manuscripts with 73 corrections users certainly do not take time to make 
marked in red ink. In two of the experi- elaborate calculations to determine which 
ments subjects simply located the line; in method to use leads to the conclusion that 
the other two experiments subjects also users are able to quickly select near-opti- 
edited the manuscript. Two of the experi- mum methods by having assimilated heu- 
ments were run with teletypewriters and ristic rules based on a few pertinent task 
two with CRT displays. Three subjects par- features. Since it might be conjectured that 
ticipated; one subject repeated the experi- if users cannot easily choose among alter- 
ment after a two-week interval and per- natives, they will either ignore one of the 
formed one experiment on a teletypewriter methods or will agonize over which to use, 
and the other on a CRT. designers who provide alternative methods 
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TECO 

Wylbur 

NLS 

Wuna 

(. \ <>/ 
Menus 

Menu of ( A 

Total-time data 

-17 

42 

30 

24 

(1.30 

0 28 

Error-free data 

•11 

35 

24 

22 

0.30 

0,18 


“Adapted from RobkTO. 


for accomplishing goals should have in 
mind clear decision rules for deciding 
among alternatives. 

1.3 Observations of Overall Task Time 

In the studies of the keystroke and GOMS 
models, it is assumed that users are experts 
and that they perform editing tasks per- 
fectly — without error. For the data analyses 
error data were either discarded or folded 
into the data as if no errors had occurred. 
Even for expert users, however, from 5 to 
30 percent of actual editing time can usually 
be attributed to errors and error correc- 
tions. If an accurate prediction of task time 
is desired, errors must also be considered. 

In an attempt to provide a comprehen- 
sive basis for the evaluation of text editors, 
Roberts investigated errors and also consid- 
ered text editors from the point of view of 
several kinds of users doing different kinds 
of work [Robe 79], Because of practical 
limitations, however, Roberts actually de- 
veloped and performed only a few of the 
many experiments she suggested. 

In one of the experiments she investi- 
gated task time on four text editors: TECO 
[Bolt73], Wylbur [Stan75J, NLS 
[Augm75], and Wang [Wang78], Four ex- 
perts performed four separate editing tasks: 
they entered a short memorandum, modi- 
fied two business letters, and corrected an 
excerpt from a text on philosophy. Since 
one of the objectives of the study was to 
provide evaluation schemes that are quick 
and straightforward to apply, she did not 
assume the availability of sophisticated 
data-recording equipment. Instead, a hu- 
man observer noted the time at the begin- 
ning and end of the task and used a stop- 
watch to obtain the time spent making and 
correcting errors. Only errors that took 


more than 30 seconds to correct were re- 
corded; small mistakes such as typographic 
errors that were caught and corrected im- 
mediately were folded into the editing lime. 
The observer also kept, track of tasks ac- 
complished incorrectly or skipped, and di- 
rected subjects to correct or complete them. 
This time was added to the error time. 

A summary of the results appears in 'Ta- 
ble 4 and shows that task times were con- 
siderably longer with TECO and Wylbur 
than with NLS and Wang (with statistical 
significance at the 0.02 level). The coeffi- 
cients of variation for the total-time datu 
indicate that differences between subjects 
account for about as much variability as 
differences between editors. For the error- 
free data, however, more variation can he 
attributed to the editors than to the sub- 
jects. Thus much of the subject-to-subjecl 
variation must be due to error rates. 

Roberts also applied the keystroke model 
of Card et al. to her data to predict task 
times. The editing sequence chosen for the 
model was a sequence of optimal methods 
for each subtask in its context. Since some 
subtasks were larger than unit tasks as de- 
fined for the keystroke model, each get- 
locate-modify-verify cycle was considered 
a new unit task. 

Results are shown in Figure 4. Ignoring 
the error data — since the keystroke model 
assumes that the data are error-free — the 
bar graph shows that the pedictions were 
25-50 percent too low, and therefore not 
nearly us accurate us expected in light of 
the validation experiments performed on 
the keystroke model itself. Further inves- 
tigation of the data obtained on TECO (an 
automatic record of all keystrokes was 
kept) showed that application of the model 
to the actual keystroke sequence still ac- 
counted only for 87 percent of the error- 
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free time. The remaining time was attrib- 
uted to unknown mental activities. 

To explain the discrepancy, it must be 
remembered that in the keystroke-model- 
validation experiments, met nods for each 
unit task were suggested and practiced be- 
fore the experiment, task acquisition was 
nominally excluded, and error data were 
completely eliminated, whereas in Robert’s 
data small errors were included in the re- 
ported times. Moreover, prediction is more 
difiicult when users choose their own (pos- 
sibly suboptimal) editing sequences. The 
keystroke model predictions provide, how- 
ever, an upper bound on how well a skilled 
user could perform if he or she were so 
practiced that method selection time would 
be nil, choices optimal, and entry flawless. 
A comparison of predicted time and ob- 
served time therefore provides useful infor- 
mation since a relatively large difference 
indicates that the editor under considera- 
tion is difficult to use optimally. 

1 .4 The Effects of Computer Response Time 

The import ance of system response time is 
universally recognized. Since the earliest 
days of interactive computing, researchers 
have discussed the effects of system delay 
and unpredictability on user productivity 
and satisfaction [Carb68J. Although some 
controlled experiments have been con- 
ducted and have produced interesting and 
worthwhile results, they have addressed the 


broader issues of interactive computing 
such as information retrieval, interactive 
design, and problem solving rather than 
text editing [Bokh71, Good78, Gros76, 
MiLL77b, Yule72], For all of these inter- 
active activities, mental preparation is 
more intense and varied than for routine 
editing tasks and is thus more sensitive to 
interruption, distraction, and unusual de- 
lays. Editing should proceed at a rapid pace; 
for most requests any perceptible delay 
may prove irritating. For a good overview 
of system response time issues in the 
broader context of interactive computing, 
see Shneiderman [Shne79], 

R. B. Miller maintains that an immediate 
response is not a universal requirement in 
interactive computing and lists various 
classes of user actions and purposes at ter- 
minals that appear to allow or require dif- 
ferent system delays [MilR68], “Goals,” 
chunks, and “closures” all play an impor- 
tant role in determining acceptable delay. 
Classifications that apply to editing include 
echo characteristics, “conversational” re- 
quests, searches, task completion, log-on/ 
log-off, and recovery from system failure 
(Figure 5). It is emphasized that these es- 
timates are “best guess” conjectures of a 
behavioral scientist who specializes in com- 
puter usage; he urges that they be verified 
by extensive system studies in carefully de- 
signed, real-life task environments. 

Some results from controlled experi- 
ments on the effects of system response 
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Classification 


liehaiiorul Aspects of Text Editors 

Accrjiia- 
l»lt* 'rime 
Delay 
(Seconds) 


Echo characteristics 

Conversational requests 
Searches 

String search requests 
Browsing, page-by-page search 

Task completion 
Program execution 

New editing assignment 
Log-on/log-off 

Recovery from system failure 


<0.1 Examples are the click of u typewriter ke\ and vi.su. 
feedback from the platen or CRT 

<- Most editing requests fall into this category 

<1 Two second delays would be preferable, but it the ii.m 

perceives the request as a complex inquii\, up to 
seconds is acceptable 

<().;> Longer delays, at least for the appearance of first sewm 
lines of the next page, are intrusive on the continuity <• 
thought 

<f) 1 here is a sense of closure, but if continuity is desired, tie 

delay should not be too long 
10— lfi A brief rest is nice 

<15 Captivity of more than 15 seconds can easily detnorali/< 
the user and reduce motivation to work 

<15 If recovery will take more than 15 seconds, the system 
should inform the user how long he/she might have h 
wait 


Figure 5. Subset of Miller’s response-time classification that is applicable to text editors. 


time in interactive systems are applicable have been largely replaced by 300-, 2400 
to editing, but care must be exercised in and 4800-baud dial-up lines. Most stand 
transferring results from one environment alone computers have 9600-baud screen dis 
to another. L. H. Miller investigated the play rates.) The effects of variability, a: 
effects of varying CRT display rates and shown in the experiment, are also likely li 
output delays on user performance and at- be detrimental in the editing environment 
titudes in a series of message retrieval tasks such variability is not uncommon with line 
[MlLL77b]. He concluded that increasing by-line transmission, 
the display rate from 1200 to 2400 baud Grignetti and Miller conducted experi 
produced no significant performance or at- ments to explore methods to motivate user: 
titude changes, but that increasing the var- to adopt behavior patterns that would 
iability of the output display rate produced improve overall system performance 
a significant deterioration in both perform- [Grig70], For example, text modification 
ance and attitude. It should be noted that could be performed with either a machine 
1200 baud is much faster than the average cycle intensive search-and-replace com 
person’s reading rate, which is closer to mand or a keystroke intensive delete-and 
about 300 baud (approximately 360 words enter sequence. They investigated both 
per minute); presumably too slow a com- controlled computer response time and an 
munications rate would have a deleterious imposed cost-reward structure to regulate 
effect. In the editing environment, however, user requests. Their experiments demon- 
there could be a significant difference be- strate that it is possible to provide incen 
tween 1200 and 2400 baud for some situa- tives that affect choices between alternative 
tions — for instance, when skimming in methods of accomplishing a task. They dis- 
search of an item to be modified. (The covered, however, that even with very ex- 
typical communication rates for terminals plicit monetary incentives users do nol 
are rising gradually — 110-baud teletype- make optimal choices. Instead, the strate 
writer connections common a few years ago gies appear to be based on some perceived 
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cost that is related but not identical to the 
actual cost. 

In another experiment ( irossberg and his 
colleagues studied response times in prob- 
lem-solving activities with mean delays of 

1, 4, 16, and 64 seconds but with individual 
delays varying widely and unpredictably 
about the mean [Gros76], In harmony wii it 
the findings of Grignetti and Miller, results 
show that subjects modified their problem- 
solving tactics as the mean delay in- 
creased — they seemed to become more cau- 
tious and deliberate. Surprisingly, however, 
the mean delay did not have a definite 
effect on the time required to reach a solu- 
tion. In transferring the results to the edit- 
ing environment, one might conclude that 
system response time has little effect on 
performance; users would simply adjust 
their tactics to make the best use of their 
time on the system. A major difference 
between a problem-solving and an editing 
environment, however, is the usefulness of 
extra mental time during long system de- 
lays. Since editing is typically a routine 
cognitive skill, additional mental prepara- 
tion time beyond what is necessary to de- 
cide what to do next would likely interfere 
with the task completion rate. Moreover, 
since the subjects in this experiment were 
the experimenters themselves, they were 
motivated to complete the assigned tasks 
successfully, but this does not imply that 
other users would tolerate such abuse. 

2. EDITOR DESIGN CONSIDERATIONS 

In this section we consider aspects of edi- 
tors that affect ease of use. Ease-of-use 
considerations are particularly important 
for novice and casual users, but also affect 
experts who wish to invoke infrequently 
used editor features or who use several 
different editors on a regular basis. Most 
authors agree that ease of use depends pri- 
marily on the command language and un- 
derlying structure of the editor and may 
also depend on the nature and availability 
of user aids. Relevant command language 
features may include the number of com- 
mands, the command vocabulary, mne- 
monics, abbreviations, arguments, defaults, 
and macros. The structure of an editor, 
though not independent of the command 


language, can be described in terms of edi- 
tor states or modes, context dependencies, 
and state transitions. In essence, design 
trade-offs balance the proliferation of 
“powerful” commands that depend heavily 
on the editor’s state against a small number 
of “basic” commands executed from a min- 
imal number of states. Our inability to 
learn, remember, and effectively use large, 
complex command sets, balanced against 
our desire to achieve editing objectives with 
a minimum expenditure of effort and time, 
limits the range of reasonable design op- 
tions. 

In this section we survey four approaches 
to editor design and evaluation: popular 
wisdom, observation, analysis, and con- 
trolled experiments. Popular wisdom stems 
mainly from introspection and is typically 
influenced by anecdotal evidence. Obser- 
vation encompasses field observations and 
surveys taken from users. Formal analyses 
address the issues of syntax and semantics 
of command language grammars. A few 
controlled experiments on editor learnabil- 
ity and user friendliness have been per- 
formed and yield valuable insights into ed- 
itor design. 

2.1 Popular Wisdom 

The literature extols the virtues of many 
text editors [Coul76, Deut67, Haze80, 
TeiW79, VanD71] and is replete with lists 
of suggestions on how to create a better 
human-oriented interface (see, for example, 
Gain78, Jone78, HANs71a, Mart73, 
Rous73, Wass73). Several of these lists are 
also compiled in Shne79. As a single rep- 
resentative example of these design guide- 
lines, Hansen’s “User Engineering Princi- 
ples” are shown in Figure 6. 

Hansen expounds upon the meaning of 
each of these principles. For example, 
for predictable behavior, he explains 
[HANs71a]: 

The importance of such behavior is that the user 
can gain an “impression” of the system and under- 
stand its behavior in terms of that impression. Thus 
by remembering a few characteristics and a few 
exceptions, the user can work out for himself the 
details of any individual operation. In other words, 
the system ought to have a “Gestalt” or "personal- 
ity” around which the user can organize his percep- 
tion of the system. 


These principles were developed during 
the design of Emily |Hans71I>|, a sophisti- 
cated program editing system for PL/1 in 
which text, is created, viewed, and modified 
in terms of the structure imposed by the 
syntax of the programming language. (We 
note that at a later time and in a different 
setting Hansen did run a controlled exper- 
iment. that tested in part the notion of 
predictable behavior [1Ians78|.) 

Typically, guidelines such as Hansen’s 
contain reasonable advice and guidance but 
are often vague and sometimes even con- 
tradictory. They suggest little more than 
what a good designer knows from experi- 
ence and common sense and do not lead to 
quantitative methods of evaluating editors. 

Martin [Mart73] and Engel and Granda 
[EngE75] present much more comprehen- 
sive guidelines. Martin discusses the user- 
computer interface, taking into account 
various hardware configurations, user abil- 
ities and objectives, and implementation 
considerations. Engel and Granda consider 
seven general categories; some of these rec- 
ommendations are presented in Section 3.2. 
These guidelines were based on a thorough 
survey of information available at the time 
they were written and represent the assim- 
ilation of experience, informal observations, 
behavioral experiments, and principles 
thought to be applicable. 

Another approach to the dissemination 
of popular wisdom is taken by Singer et al. 
[Sing77, Ledg81]. They present an anno- 
tated user’s guide for an editor, the PAS- 
CAL Assistant, with the intent to illumi- 
nate the human engineering design consid- 
erations and to explain the principles mo- 
tivating their decisions. Figure 7 provides 
an example of comments about an aspect 
of the editor. Although they make a con- 
scious effort to rely on psychological prin- 
ciples where possible, they freely admit that 
often their only guide was intuition and 
experience. 

2.2 Observation 

Opinions about the user-perceived quality 
of text editors abound, but actual knowl- 
edge is scarce. A long-range objective is to 
obtain a criterion to measure quality from 
the user’s point of view, but a first step is 
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User Engineering Principles 
First principle: Know tin* user 

Minimize memorization 
•Selection not entry (this agiees with Fiki.Tmi 
N ames not numbers 
Predictable behavior 
Access to system information 
Optimize operations 
Rapid execution of common operations 
Display inertia (the display should change u.> i 
as possible to carry out a request I 
Muscle memory (subconscious muscle niei 
should be exploited, for instance, to main 
maximum keying rate) 

Reorganize command parameters (keep tut,, 
commands simple; infrequent commands i.t, 
more complex) 

Engineer for errors 
Good error messages 

Engineer out the common errors (the design . 

need to be altered to inhibit frequent user ct i 
Reversible actions (it should be possible to re- 
the system to a previous state) 

Redundancy (provide more than one means P 
end) 

Data structure integrity (data should not be 
regardless of system or hardware malfunctioi 

Figure 6. Hansen’s table of user engineering | 
ciples (adapted from Hans71u). 

to identify relevant system properties 
user abilities and learn how to obtain d. 
about them [EMBLSla], 

One possible approach to identifying i 
evant properties and abilities is thron 
questionnaires [Dzid78]. On the basis o 
pilot study, Dzida and his colleagues 
sumed that user-perceived quality could 
seen as a multidimensional concept w 
each dimension representing an indepei 
ent characteristic of the overall quah 
They searched for these quality charact 
istics by means of questionnaires and s 
tistical analysis of the results. First, i 
experienced users were asked to state r< 
vant human-oriented system requiremei 
later, about 600 persons were asked to ju< 
100 system requirements with respect 
their importance in user-perceived quah 
Questionnaires were returned by 233 p 
sons, about half of whom were among th» 
who initially stated the relevant hum. 
oriented system requirements and hall 
whom were members of the German ch. 
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User’s Guide 


Active Behavior: When you send the Assistant a 
request, it becomes active and attempts to satisfy 
your request. It does this in three stages': 

1. Verification — The Assistant determines whether or 

not your request makes sense, and makes any 
necessary assumptions that it can when specific 
details are not given. 

2. Performance — If the verification stage was com- 

pleted successfully, the Assistant will satisfy your 
request. If the operation requested is at all time- 
consuming, the Assistant may indicate its prog- 
ress at various intervals. 

3. Completion— After your request has been satisfied, 

the Assistant indicates the final result of its ac- 
tions and again becomes attentive. 


Kalionale 


* There is a body of psychological evidence (see, for 
example, Thorndike and Hock, 1934) which suggests 
that people “learn without awareness." One impli- 
cation of these results is that the users of a computer 
system will infer underlying principles even if they 
are unaware of doing so. 

The Assistant's behavioral goals are not merely 
"sugaring,” but are accurately reflected in its re- 
sponses. These goals are intended to help the user 
make reasonable inferences about what the Assist- 
ant will do with a particular request. For example, 
the first goal, verification, ensures that no request 
will be executed unless it makes some sense se- 
mantically. In some cases, this implies that signifi- 
cant static prechecking must be performed. This 
seems a small price to pay for relieving the user of 
the burden of correcting damage done by a techni- 
cally legal but senseless request. 


Figure 7. Excerpt from the The Annotated Assistant: A Step Towards Human Engineering” [Sing77). 


ter of ACM who had not previously partic- 
ipated. 

Seven categories that accounted for 44 
percent of the variation in the data were 
identified. These categories were denoted 
self-descriptiveness, user control, ease of 
learning, problem- adequate usability 
(minimize details the user must know and 
deal with), correspondence with user ex- 
pectations, flexibility in task handling, and 
fault tolerance. These factors were isolated 
mathematically and at least five of them 
were shown to be statistically reliable and 
valid. The importance of some factors, how- 
ever, varied widely depending on specific 
user groups. For instance, casual users felt, 
much more so than did regular users, that 
ease of learning was important. Since the 
seven-factor solution explained 44 percent 
of the variation in the data, the authors 
concluded that an empirical model for as- 
sessing user-perceived quality had been es- 
tablished. 

Another approach to identifying relevant 
characteristics is through direct observa- 
tion of users on existing systems. Kennedy 
observed a large sample of clerical and sec- 
retarial staff learning to use an interactive 
system and gained insight into the effect of 
anxiety and the role of the system, the 
instructor, and reference manuals in the 


learning process [Kenn75], Initially, this 
field observation began as a controlled ex- 
periment to investigate factors that might 
affect learning, such as attitude toward 
computers, availability of manuals, self- 
learning from the system, and verbal assist- 
ance from an instructor. In the experiment 
none of these factors was shown to be sig- 
nificant, but observations did lead to sys- 
tem improvement and more effective train- 
ing. For the particular interactive system, 
Kennedy observed that (1) self-teaching 
through trial and error with feedback from 
the machine seemed most effective, (2) sub- 
tle distinctions in technical terminology 
were inadequately explained, and (3) anxi- 
ety decreased learning, particularly during 
the subject's first computer session. Obser- 
vations of this nature yield new hypotheses 
that can be tested. Even when experiments 
do not produce expected results, they may 
provide useful insights and experience. 

Users may also be observed indirectly by 
instrumenting editors to extract and time- 
stamp editing sessions. Hammer and Rouse 
collected the sequence of keystrokes and 
the elapsed time between keystrokes for 
researchers writing their own programs and 
reports using TECO and SOS [Hamm 
79], The data collected were mapped into 
sequences of uniform editor primitives (e.g., 
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insert one line, delete many characters) in 
order to make a comparison between the 
data from the two editors possible. As an 
indicator of the editing sequence, digrams 
of uniform editing primitives were counted, 
and the counts were converted into a tran- 
sition probability resulting in it Markov 
model. Statistical tests applied to the Mar- 
kov model showed that differences between 
editors and between tasks were no larger 
than the differences between users. For 22 
of the users involved in both program and 
document editing, statistical tests showed 
significant differences between tasks for 
only 25 percent of the users; most of the 
observed users edited programs and docu- 
ments by the same technique. Hammer and 
Rouse have begun to apply their Markov 
model to higher level editing operations in 
an attempt to capture more task-specific 
behavior and to separate task differences 
from individual differences. They also plan 
to study errors and the delay between key- 
strokes. 

2.3 Syntactic Analysis 

Besides user feedback through question- 
naires and direct observation, another way 
to gain insight into relevant characteristics 
is through an analysis of formal descrip- 
tions of command language grammars 
[Ledg78], Although formal grammatical 
descriptions have not generally been ap- 
plied to study the subjective aspects of ed- 
itors, Reisner [Reis 79] and Anandan 
[Anan 79] illustrate through examples how 
formal descriptions can be used in human 
factors research. Reisner describes user ac- 
tions at a terminal for two command lan- 
guages by means of a BNF-like grammar. 
Aspects of the formalism Reisner consid- 
ered to be useful for comparison include the 
number of different terminal symbols, the 
lengths of the terminal strings, and the 
number of rules necessary to describe the 
structure of some set of terminal strings for 
a given task. An examination of these as- 
pects of the formalism led to predictions 
about user behavior. In order to test these 
predictions, an exploratory experiment was 
conducted in which subjects learned both 
command languages and performed tasks 
designed to reveal information about the 
predictions. Reisner observed that subjects 


Sample Predictions 

1. Learning and/or remembering how to he lout >h.«j 

in HOBART 1 .should vary in difficulty. 

2. Learning and/or remembering how to select sh«i| 

in HOBART 2 should not vary in difficult. s 

3. Learning and/or remembering how to select a 

shape in HOBART 2 should be easier than . 
lecting the corresponding HOBART 1 shape 

Results Related to These Predictions 

Number of subjects (often) unable to select the gi\ 

shape: 

Task HOBART 1 HOBART 


Line 

0 

0 

Box 

A 

1 

Circle 

8 

0 

Continuous line 

2 

0 

Continuous box 

(» 

0 

Continuous circle 

9 

1 


Figure 8. Some predictions from an analysis of u 
grammars of HOBART l and HOBART 2, ai 
results from the exploratory experiment conduct < 
(adapted from ReisSI, ©IEEE 1981). (HOBART 
and HOBART 2 are two versions of an interact i. 
graphics system for creating slides that have esse, 
tially the same function but differ in the design • 
the human interface.) 

performed consistently with the prediction 
(Figure 8). 

In a similar vein, Anandan developc. 
state transition diagrams for two editor.- 
NUROS [UNCN79J and SIMPLI 
[EMBL81b], and counted the number c 
states, number of different command; 
number of commands issued from eacl 
state, total number of context dependen 
cies, and average number of keystrokes nec- 
essary per command (Table 5). The dat. 
reveal differences between the editors an< 
correspond with informal observations o 
the differences in ease of use. 

Several different descriptive notation, 
appear useful in this approach to asses; 
user-perceived quality, and it is not clea, 
which is best. Besides those mentione; 
here, Moran’s command language gramma. 
[Mora 81J looks promising since it de- 
scribes all levels of a command languag. 
system, from the conceptual to the physica 
device level. 

Beyond the choice of formalism, how 
ever, lies the more difficult problem of al 
taching measures of user effort to the quan 
tities that can be extracted from the for 
malism. It seems relatively easy to singl. 
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1 ABLE 5. State Transition Diagram Summaries for NUKUS and 
SIMPLE" 

NUROS 


SIMPLE 


States 

Number 
of Com- 
mands 

States 

N unibei 
of Com 
mands 

1. Begin mode 

2. Ready-to-write mode 

3. JCL mode 

4. File-display-write mode 

5. Edit mode 

6 1 . 

2 

3 2. 

9 

5 

Command selection 
mode 

Text insertion mode 

11 

2 

Total number of commands is- 
sued from states 

25 


13 

Number of distinct commands 

17 



Number of commands issued 
from more than one state 

7" 


0 

Average number of necessary 
keystrokes 

3.9 


2.5 


■ Adapted from Anan79. 

Six from two different stales and one from three different states. 


out a particular factor, collect data from 
two comparison languages, apply some 
measure, compare the results, and declare 
that one feature is better than another, but 
it is difficult to determine how much better 
and how much weight the factor should 
have. Moreover, when complex interactions 
among several factors are considered, the 
results might not turn out as expected. Per- 
haps some ideas adapted from software sci- 
ence studies [Hals77] may provide clues to 
the answers to these questions. 

2.4 Controlled Experiments 

Since very few controlled experiments have 
been conducted to assess the quality of 
text-editor command languages and system 
structure, it is necessary to draw informa- 
tion from behavioral experiments that bear 
on some aspect of text editors. One must be 
careful when transferring the results of a 
behavioral experiment from one context to 
another — the assumptions may not hold 
and the context may be sufficiently differ- 
ent to invalidate the results completely. 
With this caution in mind we examine a 
few examples. 

Freedman and Landauer [Free 66] and 
Chin-Chance [Chin 78] have obtained data 
that indicate the usefulness of the initial 
letter of a word as a recall and discrimina- 
tion clue. Permitting first-letter abbrevia- 
tions for command names gains support 
from these data. 


Newman investigated imperative state- 
ments with preconditions of two logical 
types, standing” and “one-shot” 
[Newm77], Standing preconditions are as- 
sumptions that normally hold true in a 
given situation. For example, the statement 

if you wish to edit file F, issue the com- 
mand ‘open F’ ” has a standing precondi- 
tion, “if you wish to edit file F.” One-shot 
preconditions are exceptions or are events 
that occur only once during the accomplish- 
ment of a task. For example, the statement 
“if you wish to edit file F but it doesn’t 
exist, issue the command ‘create F’ ” has a 
one-shot precondition indicated by the 
phrase but it doesn’t exist.” The results 
tend to support the hypothesis that the 
semantic processing of one-shot commands 
is more complex (because more conditions 
must be processed). Thus, for creating a 
new file, for example, Newman suggests 
that it may be preferable to design an editor 
with only a single command to open and 
create a file. 

A few controlled experiments have been 
conducted that specifically address com- 
mand language structure and learnability 
aspects of text editors [Robe79, Walt74, 
Ledg80], Walther and O’Neil investigated 
interface flexibility; that is, whether user 
options are good for everyone’s perform- 
ance and, if not, for which kinds of users 
they are helpful (or detrimental). Two dif- 
ferent versions of a text editor were specif- 
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Furnas 9. Average learning curves [Robe791. □, TECO; •, TECO (second 
instructor); O, Wylbur; A, N1.S; x, Wang. 


ically designed and constructed for the ex- 
periment. The inflexible version required 
all commands to be spelled ouL fully: no 
abbreviations, extra spaces, or defaults 
were allowed. The flexible version permit- 
ted as much freedom as possible so long as 
ambiguities could be resolved. Measures of 
user performance were derived from time- 
for-task and syntax-error-frequency data. 
The somewhat surprising results indicate 
that interface flexibility is not uniformly 
best for all users. In the experiment users 
were more prone to make syntax errors 
when offered more flexibility, but com- 
pleted tasks faster than those not having 
the options. The exceptions were the nov- 
ices who worked more rapidly without op- 
tions than with them. Unfortunately 
Walther and O’Neil’s paper does not in- 
clude their quantitative results [Walt74], 
Roberts investigated the question of how 
long it takes a beginner to acquire the abil- 
ity to perform basic editing operations — 
insertion, deletion, replacement, moving, 
copying, splitting, and merging [Robe79], 
Under the guidance of an instructor, sixteen 
subjects, four for each of the four text edi- 
tors (TECO, Wylbur, NLS, and Wang), 
learned to perform these basic tasks. A 
task-oriented instructional method was 
standardized across all editors; the instruc- 
tor taught and quizzed subjects on the tech- 


niques and commands needed to perfori 
five different teach-followed-by-quiz cyclt 
covering the basic tasks. Time spent in bol 
being taught and quizzed was controlled I 
the subject, and the results are presenti 
as a learning curve with the number < 
tasks the subject had shown the ability i 
perform plotted against time (Figure 9 
Since the learning rate undoubtedly d< 
pends strongly on the particular instructo 
a second instructor ran the experiment <> 
four additional subjects using the TECt 
text editor. A one-way analysis of variant 
of the data shows that there is a signified! . 
difference (p < .01) among systems. The i 
tests between systems show a significai 
difference between TECO and each of th 
other systems (p < .05), but no different' 
among the other three systems or bet wet-: 
the two sets of TECO experiments. 

Roberts also explored the possibility < 
analytically predicting the results of thes 
experiments in an attempt to find a let- 
expensive means of evaluating alternativ. 
systems and to gain some insight into whicl 
features of editors affect the learning rati 
She counted the commands (nl) and item 
(n2 — the entities such as verbs, arguments 
and terminators that comprise commands 
necessary to learn the basic operations fo 
each of the four editors. Table 6 shows Lh. 
results of these counts plus average count 
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TABLE 6. Phkiuction of Leahnino Hates' 1 



TECO 

Wylbur 

NL8 

'*««« 

Standard 

Knar' 1 

(min/task) 

Observed learning rate (minutes per task) 

19.5 

8.2 

7.7 

0.2 


Predicted learning rate from command vocabulary (nl) 

17.4 

1 14)' 

4 9 
(29) 

9.9 

(ID 

9.9 

(U) 

4.1 

Predicted learning rate from item vocabulary (n2) 

12.4 

(34) 

9.9 

(29) 

13.9 

(37) 

7.9 

(25) 

0.8 

Predicted learning rate from commands per task (ml) 

19.4 

(1.0) 

7.5 

(1.3) 

(>.(> 

(1.1) 

8.4 

(1.5) 

1.8 

Predicted learning rote from items per task (m2) 

17.5 

(11.5) 

11.2 

(8.0) 

7.4 

(5.9) 

4.5 

(4.3) 

2.8 


“ Adapted from Hobe79. 

* [2 (<|, — to) 2 / IN — 2)] 1/J where f p is the predicted rate, t„ is the observed rule, and A’ (=4) is the number of 
trials. 

‘ The numbers in parentheses represent the command and item counts used to predict each learning rate. 


of commands per task (ml) and items per 
task (m2) and the result of a regression 
analysis. 

It is interesting to note that of these 
counting methods, command counts seem 
to predict learning rates better than item 
counts, a somewhat counterintuitive result 
since the command count is a cruder meas- 
ure. Furthermore, it would appear that it is 
not the absolute number of command (or 
item) types that matters, but the number 
of commands (or items) that must be strung 
together to perform an elementary editing 
task. Since these results are based on only 
four sample points, the results Eire far from 
definitive, but the hope remains that ana- 
lytical predictions of this nature may some- 
day be regarded eis valid measures for as- 
sessing an editor’s strengths and weak- 
nesses with respect to learnability. 

Ledgard et al. hypothesized that an in- 
teractive system should be based on famil- 
iax, descriptive, everyday words and Eng- 
lish-like phrases [Ledg 80], They observe 
that this hypothesis is not generally ac- 
cepted in the design of the vast majority of 
interactive languages. To test their hypoth- 
esis, they compared two text editors: a com- 
mercially available Control Data Corpora- 
tion editor supplied with NOS that has a 
typical notational syntax, and a remodeled 
version of this editor with identical power 
but with its syntax altered so that its com- 
mands are all based on legitimate English 
phrases. Twenty-four paid subjects, distrib- 
uted among three levels of familiarity (eight 
novice, eight intermediate, and eight ex- 


pert), participated in the experiment. Each 
subject performed editing tasks on both 
versions of the editor; half of the subjects 
at each level of familiarity tried the nota- 
tional version first and half tried the Eng- 
lish-based version first. 

’Fhe results are summarized in Table 7. 
Subjects completed more tasks with the 
English-based editor ( p < .001). The error 
rate for the English-based editor was lower 
(p < .01). The editing efficiency for the 
English-based editor was better (p < .01). 
Since both editors were semantically iden- 
tical and since the performance on the Eng- 
lish editor was strikingly superior, this work 
demonstrates conclusively that the surface 
syntax of an editor is surprisingly important 
from a human engineering point of view. 

3. INPUT AND OUTPUT DEVICES 
AND TECHNIQUES 

The work station used in text editing, gen- 
erally called a terminal, consists of a key- 
board used as an input device and a screen 
display or printer used as an output device. 
Screen display terminals may also be 
equipped with auxiliary input devices, such 
as a light pen or joystick, that serve to 
locate a specific item of information on the 
display. This section is a review of the 
psychological and human factors aspects 
underlying the design and use of keyboards, 
screen displays, and pointing devices. 

A thorough discussion of the fundamen- 
tals of physiological psychology applicable 
to the design of computer terminals can be 


found in Rupp77. Hup]) and Hirsch review 
the mechanism of the human visual system 
and explain its optical properties. They are 
careful to consider the requirements of the 
entire population of terminal users, includ- 
ing persons with various visual disorders. 
They take into consideration ambient light- 
ing and derive acceptable ranges for display 
color, refresh rate, brightness, contrast, and 
character qualify. A similar treatment, 
starting with basic tactile parameters, is 
applied to keyboard design. Finally, the 
design of the entire work station, including 
chair, keyboard, and display height, is re- 
viewed, and recommendations are provided 
for work-rest cycles and ambient condi- 
tions. 

3.1 Key Entry 

Key entry is unquestionably the most com- 
mon means of encoding letters and num- 
bers in computer-readable form. In addition 
to their use in interactive display terminals, 
keyboards are used in keypunches, key-to- 
disk and key-to-tape data-entry systems, 
photocomposers, hex keypads, pushbutton 
telephones, mail-sorting devices, and spe- 
cial-purpose operator consoles. In each case 
finger action is used to convert alphanu- 
meric information to electronic form. 

While there has been little research di- 
rectly on the use of key entry in program 
and text editing, some of the information 
accumulated in over one hundred years of 
research on keyboard operations is directly 
applicable, and other results are suggestive. 
The earliest work was motivated by an 
interest in telegraphy, followed by decades 
of investigation of typing and adding ma- 
chine operations. More recently, there have 
been a number of pertinent experiments on 
keying speed and error rate in electronic 
data entry, postal mail sorters, and tele- 
phone “dialing.” The findings most appli- 
cable to program editing concern the design 
of keyboards, keying rates, and keying 
errors. 

3.1.1 Keyboard Devices 

The first recorded patent for a typewriter 
was taken out in England in 1714. Com- 
mercial models, based on a design of Sholes, 
Glidden, and Soule and manufactured by 


TABLE 7. SUMMAKY OK PhHl OHMANl'k" 


Mean percentage of editing 
tasks completed 1 ' 


Inexperienced users 

42 

28 

Familiar users 

03 

13 

Experienced users 

84 

74 

Average 

03 

48 

Mean percentage of erroneous 


commands' 



Inexperienced users 

1 1 

19 

Familiar users 

0.9 

18 

Experienced users 

5.0 

9.9 

Average 

7.8 

10 

Mean efficiency (percent)' 1 



Inexperienced users 

43 

31 

Familiar users 

53 

30 

Experienced users 

58 

53 

Average 

51 

40 

“ Adapted from LedgBO. 



" COMPLETION KATE ■ 

= (COK -- 

KKK. 

TOT — COK 



' EHROH HATE = (SYN + 

SEM)/NUM 

CMI).- 

■' EDITING EFFICIENCY 

- (POS - 

N E< 1 1 


NIJM-CMDS 


where 

COK 

is the number of indicated cone- 

ERK 

tions made to the text; 

is the number of erroneous change 

TOT-COR 

made to the text: 

is the total number of indicated cm 

SYN 

rections requested; 

is the number of commands tin 

SEM 

were syntactically ill formed; 
is the number of commands tin. 

NUM-CMDS 

were semantically meaningless; 
is the total number of commune 

POS 

issued; 

is the number of commands tin. 

NEC 

resulted in an improvement of th 
text; 

is the number of commands th.. 


resulted in a degradation of the tex 


Philo Remington, did not become popula 
until the 1880s. The teletypewriter, a direr 
antecedent of today’s interactive terminal:- 
was invented in 1904, and the electric typi 
writer, as we know it today, came into u.s. 
in about 1935. 

Virtually all of the more than 10 millioi 
typewriters in use in the United States hav 
the standard “QWERTY” arrangement o 
alphabetic keys, which is also duplicated ot 
computer terminals and key entry device: 
Many other arrangements of keys, designs 
for increased speed, have been proposed 
but none have caught on [Alde72], On th 
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other hand, anarchy reigns with regard to 
punctuation keys and special-purpose keys 
(such as “backspace”). Various proposed 
standards are reviewed in Seib 72. One of 
the major problems is the putative desira- 
bility of tying the standard to the ASCII 
code in such a manner that the upper- and 
lowercase symbols represented by a given 
key differ only by a single bit in their ASCII 
representation (bit pairing). 

In numeric keypads the controversy be- 
tween proponents of the ten-key adding 
machine keyboard (1, 2, 2 in the bottom 
row) and the pushbutton telephone key- 
board (1, 2, 3 in the top row, zero at the 
bottom) appears to favor the latter 
[Seib 72], although many new calculators 
still use the adding machine arrangement. 
Many computer terminals preserve the con- 
ventional two-hand numeric arrangement 
along the top row and also provide a sepa- 
rate cluster of numeric keys at the right of 
the keyboard. 

The standard key size is 0.5 inch (1.27 
centimeters) in diameter and the standard 
horizontal spacing is 0.75 inch (1.81 cm). 
The slope of the keyboard, the force nec- 
essary for key depression, the key displace- 
ment, and the type of kinesthetic feedback 
from key actuation may be varied within 
wide limits without affecting performance 
[Alde72]. 

The familiar concept of the shift key may 
be extended to “chord” keyboards, of which 
the ultimate expression is the stenotype 
machine [Hill 59]. The motivation behind 
multipress keyboards is that the number of 
keystrokes per second is limited by the 
ballistic constraints on the fingers rather 
than the information-processing limitations 
of the operator; hence mental encoding of 
the material into group codes may lead to 
greater throughput [Roch78], Engelbart 
and English have experimented with a five- 
key chord set mounted on a “mouse” used 
for cursor control. They claim that the 
chord set is favored by operators over the 
standard two-handed keyboard for entering 
literal strings of fewer than ten characters 
[EngD 68]. Seibel suggests that improve- 
ments of up to 150 percent in text entry 
rate are obtainable, albeit at a considerable 
training cost [Seib72], 

Functional encoding, where each indi- 
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vidual key corresponds to a predefined 
string of characters, is another possible 
means of increasing throughput. Schoonard 
and Boies, among others, describe experi- 
ments with a “short-type" abbreviation 
scheme for the most common words in a 
particular type of text and report significant 
gains [Scuo75], Because of the length of 
training required, these methods are un- 
likely to see widespread application in text 
editing. In any case, most commands in text 
editors, and most keywords and variable 
names in programming languages, are al- 
ready quite short. Some terminals offer al- 
ternative programmable or switchable key- 
board and display symbol sets, such as APL 
characters, while others comprise pro- 
grammable function keys for which visual 
identification is usually provided in the 
form of disposable overlays. A more flexible 
means of encoding keys, where the operator 
must type in only the minimum number of 
letters necessary to provide an unambigu- 
ous operand and the computer then com- 
pletes the rest of the message to provide an 
easily verifiable display, has also been stud- 
ied [Fiel78] and is discussed in the follow- 
ing. 

The human engineering aspects of the 
data communications protocols of type- 
writer-like terminals used for editing are 
considered by Ossanna and Saltzer 
[Ossa70]. They compare duplex and half- 
duplex connections in terms of desirable 
and undesirable sequencing of input and 
output and of asynchronous read-ahead 
and type-ahead strategies. They also dis- 
cuss — without, however, presenting any ex- 
perimental observations — the effects of 
locking and unlocking the keyboard, local 
and echo printing, line overflow conditions, 
interrupt signals (break key), and acciden- 
tal disconnects. Listed among important 
terminal characteristics are device self- 
identifying features (still available only as 
an expensive option on most terminals) and 
programmed control of secondary features 
such as line feed, horizontal and vertical 
tabulators, character size and code, bit rate, 
parity, and audible whistles or bells. Much 
of the information presented in the paper 
is the direct result of the authors’ experi- 
ence with M.I.T.’s MULTICS and IBM’s 
TSS. 


Ossanna and Saltzer argue convincingly 
that users should not be unnecessarily pro- 
hibited from entering items as fast as they 
can think of and type them. Four mecha- 
nisms they suggest for accomplishing this 
are 

(1) read-ahead or type-ahead operation of 
the terminal, whereby typing action 
characters triggers program execution 
but does not inhibit further input; 

(2) allowance for more than one independ- 
ent component between action charac- 
ters — for example, multiple commands 
on a single line; 

(3) no unnecessary activity by the com- 
mand processor — arguments should be 
accepted immediately after a command 
rather than being necessarily 
prompted; 

(4) provisions for the creation of data files 
or macros as an alternative to direct 
input from the keyboard. 

They conclude that truly convenient ter- 
minal operation can be achieved only 
through coordinated design of the terminal, 
the terminal control hardware, the terminal 
control software, the system’s command in- 
terpreter, the commands, and other pro- 
grams. 

Perhaps the most detailed consideration 
of the human factors aspects of typewriter- 
like computer terminals designed for time- 
sharing systems is that undertaken by Dol- 
otta [Dolo70], The specifications offered 
were initially generated by the Character- 
Oriented Conversational Terminal Sub- 
committee of the Time Sharing Project of 
SHARE and had been reviewed prior to 
publication by over one hundred “experts.” 
Consequently, Dolotta may perhaps be par- 
doned if some of his remarks sound dog- 
matic and if he regards terminal develop- 
ment as an incessant struggle between in- 
jjp nocent terminal users and callous terminal 
4 manufacturers. 

Dolotta provides a list of required and 
£ optional features, including size, weight, 
power consumption, audible signals, docu- 
mentation, maintainability, keyboard lay- 
out, forward and backward paper feed, ink- 
M ing mechanisms, noise and vibration char- 
w : acteristics, character spacing, size and 
shape, control panel layout, function keys, 


gives some idea of the functions considered 
Interestingly, however, Dolotta did noi 
foresee the advent of printing terminal, 
with one-line or several-line displays, sue! 
as are currently available in several word 
processing stations. Furthermore, some m 
the suggested functions, such us the mech 
anism for locking the keyboard, appear tin 
necessary in modern read-ahead terminal. 

3.1.2 Typing Speed 

Among the important physiological ami 
psychological correlates of typing skill an 
finger ballistics, reaction time, motor learn 
ing, short-term memory, and human infot 
mation-processing capability. The averagt 
single-finger tapping rate is of the order ol 
six taps per second, with a 20 percent in- 
crease in speed from little finger to index 
finger and a 2-3 percent increase in favm 
of the dominant hand. Good typists average 
less t han 0.2 second per keystroke (50 words 
per minute) for short periods, in contrast to 
0.7 second per keystroke for less frequent 
keyboard users [Alde 72J. The interval be 
tween the fastest digrams is about 0.08 sec- 
ond for experienced typists, corresponding 
to a tapping rate of 12 taps per second 
[Fox64]. Digrams typed with alternate 
hands are about 25 percent faster than di- 
grams typed with the same hand. These 
digram intervals determine the necessary 
output rate for a display under direct key- 
board control. It is also of interest that 
locking the keyboard for a period almost as 
long as the average digram interval does 
not unduly interfere with typing perform- 
ance after a short adaptation period 
[Alde72]. 

Baddeley reports that it takes the aver- 
age postal employee with no previous typ- 
ing skill 60 hours of practice to reach a rate 
of 0.67 second per keystroke on material 
containing British postal codes (which may- 
be more similar to program code than to 
plain text). His experiments show that the 
most “efficient” training regimen demands 
only a single hour of practice per day. If the 
schedule is accelerated, the total number ol 
hours of practice required to reach crite- 
rion-level speed increases [Badd 78], In 
view of the ample evidence that poor typing 
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habits are difficult to shed and Lhat most 
self-taught typists do not reach even half 
the speed expected from entry-level typists, 
it may be worth considering the benefits of 
specialized training for interactive com- 
puter users. 

In a well-constructed and often cited ex- 
periment, Shaffer anil Hardwick measured 
the short-term performance of twenty qual- 
ified touch typists at the University of M\- 
eter. The material ranged from difficult but 
coherent text to randomly selected words 
and to random character sequences gener- 
ated by zeroth- and first-order Markov pro- 
cesses. Very little difference was found be- 
tween ordinary prose (average: 0.159 second 
per keystroke) and random words (0.162 
second per keystroke), but the mean inter- 
val between symbols more than doubled for 
random letter strings broken into word- 
sized chunks [Shaf68], This is entirely con- 
sistent with observations regarding the lim- 
itation of human information-processing 
capabilities [Mn,G56]. 

Explanations of the typist’s ability to pro- 
cess meaningful segments more rapidly 
than random sequences usually involve the 
“acquisition of a hierarchy of habits” (in 
this case the ability to type a whole word 
as a single unit), which was first investi- 
gated in an inspiring turn-of-the-century 
study of novice, apprentice, and master te- 
legraphers [Brya99], Another possible ex- 
planation is that typists may be able to read 
farther ahead (without forgetting the ma- 
terial) on words as opposed to random char- 
acter strings [Hers65], Other experiments 
on key-to-disk data entry with experienced 
keypunch operators, however, yielded a 
median interstroke interval of about 200 
milliseconds regardless of the type of ma- 
terial — numerals, alphanumeric codes, or 
English words— as long as the “text” was 
broken up into groups of about five symbols 
[Neal77], 

Since the keystroke sequences in many 
editing tasks do not resemble ordinary 
prose, it would be of some interest to deter- 
mine whether or not data derived from such 
transcription tasks are of any value in esti- 
mating parameters for keystroke models. A 
frequently verified observation that might 
transfer to experiments on text editing is 
that short-term timed tests generally yield 


estimates of productivity that are al 
twice those measured over an eighl-l 
workday [Seib 72], Fatigue effects ma,\ 
count for this difference, since meus 
nients indicate that experienced key »•, 
operators may execute 56,000 to 8,'i 
keystrokes per day corresponding <ml 
0.51-0.85 second per keystroke [Ai.dk V 

3.1.3 Keying Errors 

As in the case of typing rates, most o! 
published information on errors was 
obtained in an editing environment. 1 
thermore, error rates vary much mure I, 
operator to operator and from task to I 
than does speed. Error rates of keypm 
and bankproof machine operators at s 
eral installations were found to range fi 
0.02 to 0.04 percent [Klem62J. Shaffer ; 
Hardwick, in the typing experiment m 
tioned above, observed a 0.6 percent un 
tected error rate on prose and word nu. 
rial and 2.3 percent on random chanu 
strings. They characterized the errors 
those of omission, response, reading, o 
text, and random, and report the distril 
tion among the various types [Shafi. 
Baddeley’s postmen, at criterion spy. 
committed 1.0 percent errors on mail coi 
(possibly a strong argument for electro: 
mail) and immediately detected 50 perci 
of them. With a three-month layoff with, 
practice the error rate doubled, while t 
speed decreased by only 30 perci 
[Badd78], 

John Long studied the effects of visi 
feedback on keying performance. 1 
showed that masking tire keyboard redui 
the speed and accuracy of skilled typis 
while masking the printed text reduces on 
the accuracy. Masking the keyboard i 
creased the error rate from 0.9 to 2.6 pi 
cent per character and decreased the spe. 
by about 30 percent. Masking the coj 
increased the error rate by 40 percent b 
cause the operator failed to catch mai 
errors [LoNG76a], 

Long also studied delayed irregular fee, 
back by having the terminal print a symb 
with a considerable delay (averaging li 
milliseconds) after the corresponding kt 
was pressed. He showed that such del; 
affects only unskilled operators. Skilled o| 
erators tolerated the delayed auditory an 
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visual feedback with no degradation in per- 
formance after a brief training period 
[Long 70b], Long’s principal interest in 
these studies was the verification of psy- 
chological feedback models for highly prac- 
ticed perceptual motor skills rather than 
improved device design. 

In other typing experiments performed 
using a line-oriented text editor, the vast 
majority of the errors were noted and cor- 
rected before each line was entered; a much 
smaller number had to be corrected subse- 
quently using the text editor. Furthermore, 
the residual word error rate after entry and 
correction of the text (a lengthy technical 
article) was 0.52 percent as compared with 
3.40 percent in a five-minute timed test 
where no corrections were allowed 
[Scho75]. The overall keystroke rate 
adopted by an individual of given skill is 
governed by the number of detected errors; 
if too many errors are incurred, the individ- 
ual slows down [Rabb70], The specific 
trade-off point may be shifted by the re- 
ward structure imposed, which may stress 
either speed or error-free performance 
[Alde72], Hence we might expect that in 
text editing, commands— where a mistake 
might be disastrous — would be entered 
more hesitantly than text. The keying rate 
would also be governed by how conven- 
iently errors may be corrected. The number 
of errors generally increases with word or 
code length; three or four characters appear 
to be the optimum group length [Seib72], 

Among different typists speed and error 
rate seem to be inversely correlated: the 
faster typists are more accurate [Carl 63], 
Among typists working approximately at 
the same speed, large differences in error 
rate are common: the most accurate 10 
percent of the typists make six to ten times 
fewer mistakes than the least accurate 10 
percent [Seib72], 

Different means of signaling detectable 
typing errors are reported in Sega 75. Sev- 
enty subjects performed two tasks (one con- 
sisting of entering 20 five-letter permuta- 
tions and the other of listing 25 states) using 
a PLATO terminal. Erroneous entries were 
flagged either immediately or at the end of 
the task. It would appear that immediate 
interruption when an error is committed 
leads to about 25 percent faster task com- 
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pletion than the indication ol errors only 
alter completion of a major segment, of the 
task. Other considerations, such as the total 
number of erroneous key presses, favor the 
delayed indication, but this result was not 
significant at the 0.05 level. The tasks and 
methods reported in this study are too spe- 
cialized, however, to allow drawing any gen- 
eral conclusions regarding the most suit able 
form of error indication. 

The relatively high rate of simple, “eas- 
ily detectable errors in keyed input gives 
rise to the question of automatic col l ection 
of such errors. A number of researchers, 
among whom C. R. Blair [BlaiOO] is usu- 
ally accorded precedence, have tackled the 
problem, and several commercially availa- 
ble word-processing systems actually incor- 
porate spelling verification routines 
[Pete80], Automatic error-correction tech- 
niques are described in Muth77, which also 
contains many references to earlier experi- 
ments. 

It is, of course, tempting to think of spell- 
ing correction as an integral part of editing. 
One cannot help asking to what level and 
complexity automatic error-correction 
techniques can be extended eventually. 
Currently available editors already provide 
some help of this type. Illegal commands 
and arguments are flagged immediately, al- 
though no automatic correction is provided. 
One level deeper, editors imbedded in con- 
versational programming languages call on 
the interpreter to analyze a line of program 
code as soon as it is entered and provide a 
warning of incorrect syntax [Wilc76], In- 
dications of execution errors may also be 
provided. The thought of immediate auto- 
matic correction of some of these errors, 
particularly in an interactive environment 
where the programmer can always override 
a miscorrection, is appealing [Fosd78], 

3.2 Display Devices and Screen-Oriented 
Operations 

Typewriter-oriented editors provide a key- 
board as the user’s sole means of commu- 
nications with the system, but screen-ori- 
ented editors may provide additional di- 
mensions for communication by means of 
graphic input devices. In this section we 
review the literature on display devices and 


experimental observations on pointing de- 
vices and pointing skills, while adding out- 
usual caveat that much of the information 
has been collected without regard for the 
specific requirements of editing tasks. 

3.2. 1 Display Format and Terminal Design 

A comprehensive set of human factors 
guidelines for man/display interfaces is pre- 
sented in EngE75. According to Engel and 
Granda their major thrust was “to stale 
guidelines based on observable, reported 
evidence gathered in some systematic man- 
ner rather than to rely on hearsay, personal 
preference, or programming convenience.” 
Over 100 specific suggestions are made. Un- 
fortunately, they are not related by citation 
to the bibliography of the report, and it is 
impossible to determine to what extent 
they meet the authors’ objectives. 

The suggestions are divided into the cat- 
egories of display format , frame content, 
command language, recovery procedures, 
user entry techniques, general principles, 
and response time requirements. Some ex- 
amples from the section on display formats 
that are applicable to editing are 

• Display string of five or more digits or 
alphanumeric characters in groups of 
three or four. 

• Number menu items starting with one, 
not zero. In counting, people start with 
one; in measuring, they start with zero. 

• Use vertically assigned lists with left jus- 
tification for most rapid scanning. Sub- 
classifications can be identified by in- 
denting. 

• Use left justification with text, right jus- 
tification for numerals. 

• Always place a period after item selection 
number, at the end of a sentence, and 
where necessary for clarification. 

• Make sure that abbreviations, mnemon- 
ics, and acronyms do not include punc- 
tuation. 

• Use numbers only when listing selectable 
items. Alphabetic characters or bullets 
may be used in prose/text. 

Pankove, in the introduction to a recent 
anthology on display design, lists the fol- 
lowing principal psychophysical factors re- 
lated to display perception [Pank 80]: lu- 
minance and brightness; color (hue, satu- 


ration, brightness); contrast; directional \ 
ibility; and size and resolution. The no 
mum detectable visual stimulus consist. 
GO quanta of blue-green (510 nanomelt 
light. Luminance is a measure of the rad 
lion emitted by an object, while brightm 
takes into account the variation in the s. 
sitivity of the human eye with waveleng 
Since the human eye is capable of adapt i 
to a 10' : 1 range of light levels (with m. 
of the adaptation taking place in the re! i 
and only about 20:1 in the pupil), t 
brightness requirements for a display . 
pend primarily on the ambient illumit 
tion. 

The smallest picture element need not 
smaller than the size needed to subtend 
minutes of arc, the effective resolution 
the retina. This corresponds to 25 elemet 
per centimeter at a viewing distance of 
centimeters. Display contrast (the norm, 
ized difference between the brightest a, 
least bright spots) ranges from about 3 
20, and is adjustable for operator comb 
on most display terminals. A gradual t v 
fold change in contrast is normally intpi 
ceptible on large displays. 

An ergonomic approach to the design 
a specific display terminal is described 
Olso80. Olson claims that the optimal di 
play character is 2.54 by 3.18 millimeters 
size (4 : 5 aspect ratio) and is defined on 
7x11 dot matrix. His characters are so 
rounded by a one-dot-wide margin and lo\ 
ercase characters have two-dot descende 
for increased legibility. The display color 
yellow-green on a grey background and 
contrast level of 3 : 1 is achievable even 
very high ambient illumination. The curs, 
is a graphic rectangle alternating with tl 
character occupying the same position at 
rate sufficient to provide comfortable rea. 
ing of the indicated character. The dispin 
face tilt adjustment is -10 to +30 degre. 
relative to the vertical, and 90 degrees i 
horizontal rotation. The refresh rate is < 
hertz driven by a crystal clock to alio 
flicker-free operation on 50-hertz line fn 
quency. A P-31 phosphor was selected ; 
the best possible compromise to satisfy tl. 
ergonomic requirements of maximum ret 
nal efficiency, maximum character sharj 
ness, good display contrast, and minimut 
flicker without character smear. 
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The complex perceptual relationships be- 
tween phosphor persistence, regeneration 
rate, display resolution, and scan order are 
examined in Goul68 and Dill 70. The con- 
clusion of Gould and Dill is that the major 
effect of a pseudorandom scan order, as 
opposed to a raster scan, is to reduce the 
disturbing effects of display flicker when it 
does occur rather than to reduce substan- 
tially the regeneration rate required for 
flickerless display. They argue, on the basis 
of their experiments, that the minimum 
acceptable (flickerless) regeneration rate 
for computer-controlled cathode-ray-tube 
terminals with static displays is 15-20 
frames per second. 

Although most editor displays (except for 
one-line displays incorporated into type- 
writer terminals in word-processing units) 
are based on cathodoluminescenee (cath- 
ode-ray terminals), Pankove lists light- 
emitting diodes (LED), plasma panels, elec- 
troluminescence (EL), incandescence, 
liquid crystals (LC), electrochromicity 
(ECD), electrophoresis (EPID), and elec- 
trooptic modulation as possible display 
mechanisms [Pan k 80], 

For additional references on the ergo- 
nomic aspects of video terminal design, the 
reader may turn to Caki 79, which has a 
bibliography of 363 titles. 


graphic tablet or coordinate digitizer. The 
article is most thorough in exploring the 
innumerable physical phenomena that may 
be exploited for curve tracing. No major 
new devices have become popular in tbe 
intervening six years, although several ter- 
minals are now equipped with A'- V wheels 
and technical improvements have taken 
place in light-pen and touch-panel design. 

Engel and Granda also discuss the advan- 
tages and disadvantages of various cursor 
control devices without, however, specific- 
reference to theoretical or experimental 
work supporting their views [EngE75], 
They consider the following devices: 

(1) light pen (a light-sensitive device), 

(2) selector pen (a light-emitting device), 

(3) joystick, 

(4) track ball, 

(5) mouse, 

(6) thumb wheels, 

(7) digitizer stylus, 

(8) keyboard. 

Their overall recommendations favor the 
joystick on the basis of control character- 
istics and range of applications. They sug- 
gest that both rapid movement and vernier 
modes be made available under user con- 
trol, and that the selectable areas on the 
screen be made as large as possible. 


3.2.2 Pointing and Cursor Control Devices 3.2.3 Pointing Skills 

Positional reference to a particular item In a 1965 experiment English and his col- 
displayed on a screen is one of the most leagues compared the speed and accuracy 
common requirements in program editing of positional referencing using a mouse 
and may be accomplished either by point- joystick, knee control, and light pen 
mg directly at the item or by moving a [Engl67], Since English et al. were specif- 

h ?u 0C t t10 ?. A , C T SOr f ay *> e ically interested in editing, this experimen- 
ontrolled either by the keyboard or by tal design took into consideration the “ac- 
means of some anciUary analog positioning cess time” of the devices, that is, the time 
device. In normal editing operations, posi- necessary for the hand to leave the key- 
tional references are interspersed with al- board, execute the pointing task, and return 
phanumenc entry of commands or text us- to the keyboard. Both experienced and in- 
mg the keyboard. The major design ques- experienced subjects and both coarse and 
tion therefore, is to determine the most fine (“word” and “character”) pointing 
rapid, accurate, and convenient means of tasks were studied. With the mouse, knee 
positional referencmg. control, and joystick, the cursor was moved 

. ^ fairly complete description of the var- on the screen using visual feedback; with 
ious devices available in 1975 for positional the light pen, the pointing action was direct, 
re erencmg is presented in Ritc75. Ritchie The major conclusion of the experiment 
considers the light pen, tracker ball, mouse, was that in the circumstances studied the 
knee control, joystick, touch panel, and mouse is the preferred device. The authors, 

Computing Surveys, Vol. 13, No. 1, March 1981 



"•~vr 


Behavioral Aspects of Text Editors 


1 AHLE 8 . ( 'OMI’AHISON OK I'UIKTINC DkVK'K.S 

Mean Time (Seconds) to Locate Operand 


Experimental Condition 

Mouse 

( Ira fa- 
eon 

Light 

Ten 

Joy stick 
(abso- 
lute) 

Experienced subject*, •‘character mode,” no 
penally for errors 

1 .OH 

2.-1H 

2 . in 

2.87 

Experienced subjects, "characte r mode," U0 
percent penalty for errors'* 

1 .00 

2.57 

2.28 

H.H 

Experienced subjects, “word mode," no pen- 
alty for errors 

1.08 

1.92 

1.81 

1.99 

Experienced subjects, "word mode,” HO per- 
cent penally for errors 

1.7-1 

1.97 

1.9H 

2.07 

Inexperienced subjects, “character inode," no 
penalty for errors 

2.02 

H.20 

2.43 

H 29 

Inexperienced subjects, "character mode,” HO 
percent penalty for errors 

“Adapted from Enui.07, (OIEKK 1007. 

*’ Engl 07 does not offer a rationale for the HO 

2.71 

percent err 

3.51 

or penalty. 

2.0-1 

3.5-1 


however, were very careful (o point out the ment, and some cursor motions requin 
restricted scope of their assumptions. The pressing the shift key as well). Never! h 
overall results are shown in Table 8; it is less, the author reports that in the com. 
seen that the differences between the de- of routine operations, persons who ba 
vices are minor. Unfortunately, the use of either alternative available frequently u. 
the keyboard for controlling the cursor was the keyboard. They tend to use the ligl 
not included in this experiment. pen mainly for reverse-direction operation 

In a subsequent experiment Goodwin which in that particular keyboard requii 
compared the light pen, light gun, and key- a multiple shift-key action. As also point i 
board in three cursor control tasks out by English, the time required to reai 
[Good75], (The light gun is simply a pistol- a target with the light pen is independet 
grip mount for the light pen, with the some- of the initial position of the cursor an 
what awkward switch on the original pen depends mainly on the final accuracy n 
replaced by a trigger. As it turned out, there quired (this assumes, of course, that tl 
was no significant observed performance initial position of the light pen is indepeni 
difference between the light pen and the ent of that of the target). On tbe oth( 
light gun, although the latter was better hand, with keyboard control tbe time , 
liked.) The three tasks consisted of pointing almost directly proportional to the lengt 
to randomly appearing spots on the screen, traveled by the cursor. In either case th 
pointing to a series of spots in sequential size of the target is important: it is faster l 
top-to-bottom, left-to-right order, and point to a word than to a period, 
pointing to typographic errors in a segment It should also be mentioned that in hot 
of text. Whenever the cursor reached the experiments the authors reported some di: 
squired position, the subject had to enter satisfaction with the light pen with regain 
an “x” on the typewriter. Unfortunately, an to accuracy, since signals are sometime 
x— even an uppercase X— can be entered picked up from adjacent characters. Pro 
with one hand; consequently the experi- longed use of the light pen was also report l-i 
ment did not really simulate the inter- to be fatiguing since the arm cannot rest 
spersed pointing and text entry typical of The time necessary to pick up and deposi 
editing. Not surprisingly, the light pen/light the light pen would be saved by using th. 
gun proved faster than positioning by touch panel, but the ordinary human indc. 
means of the very awkward keyboard ar- finger is not shaped correctly for selectinj 
rangement provided (the spatial arrange- 1/8-inch characters. In one design men 
ment of the cursor control keys did not tioned by Ritchie the cursor position i 
correspond to the direction of cursor move- offset from the sensed position of the tinge 
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TA13LK 9. ( 'omi’ahison of Cursor Control 1)kvk ks" 
Overall Times 


Device 


Movement Time for Nonerror Trials 
(seconds) 


Homing 

Time 


Positioning 

Time 


Total Time 


Error 
Rate ( ( < ) 


Summary of Models 
for Positioning Time (T, M «) 



M 

SD 

M 

Mouse 

0.36 

0.13 

1.29 

Joystick 

0.26 

0.11 

1.57 

Step keys 

0.21 

0.30 

2.31 

Text keys 

0.32 

0.61 

1.95 


a Adapted from CARD78b. 


SD 

M 

SD 

M 

SD 

0.42 

1.66 

0.48 

5 

22 

0.54 

1.83 

0.57 

11 

31 

1.52 

2.51 

1.64 

13 

33 

1.30 

2.26 

1.70 

9 

28 


T, m « = 1.033 + 0.096 log_.( D/S + 0.5) 
T |mw = 1.036 + 0.205 log A D/S + 0.5) 
T |mm . = 1.197 + 0.052 ( l)JS x + IK/S V ) 
T lHm = 0.658 + 0.209 N 


in such a manner that the finger can guide 
the cursor freely without obscuring it 
[Ritc75]. 

The relative speeds of indirect pointing 
methods and keyboard-controlled cursors 
in selecting targets for editing tasks were 
eventually also compared in Card 78b. The 
pointing devices included in this experi- 
ment were the mouse and a rate-controlled 
isometric joystick (a peculiar choice since 
a direct-reading joystick was shown earlier, 
by a group which also included English, to 
be superior to the rate-controlled joystick). 
The key controls consisted of step keys 
(where the horizontal and vertical motions 
are independent of the information dis- 
played) and text keys (which can cause the 
cursor to skip entire words or paragraphs). 
The light pen was not included in this ex- 
periment. 

Learning effects which might favor one 
device or another were carefully eliminated 
by training all subjects to criterion. It was 
demonstrated, however, that the learning 
curves of positioning time versus amount of 
practice can be approximated by a power 
curve, as predicted in DeJo57. 

Both positioning speed and error rate 
were studied in a comprehensive experi- 
mental design that allowed the study of 
these variables as a function of approach 
angle to the target, target distance from the 
initial position, and target size. Card and 
his colleagues demonstrate that the relation 
between positioning time, target size, and 
target distance obeys a version of Fitts’ law 
[Welf68]. According to Fitts’ law the time 
necessary to make a hand movement to a 


predetermined position may be expressed 
as 

Positioning time 

= Ki> + K'\og->(D/S + 0.5) seconds, 

where D is the distance moved, S is the size 
of the target, and K and K a are constants. 

Although Card computes the values of 
the constants K and K tt under various con- 
ditions, for our purpose Table 9 provides a 
sufficient idea of the relative magnitudes of 
the phenomena involved and of the appro- 
priate values of the parameters in Fitts’ 
law. In Table 9, which shows values aver- 
aged over several experiments, homing time 
is measured from the time the subject’s 
right hand leaves the space bar until the 
cursor begins to move. Positioning time is 
the interval between the beginning of cur- 
sor movement until the selection button is 
pressed. N m in is the minimum number of 
keystrokes necessary to reach the target 
with the text keys. The error rate is the 
fraction of unsuccessful trials in attempting 
to reach a target with an average size of 4.2 
centimeters. 

Card et al. conclude that the mouse is 
the uniformly superior device with respect 
to both speed and error rate. They also 
claim that the mouse approaches the phys- 
iological limits of performance for an analog 
pointing device. To credit these conclusions 
fully, however, more information than is 
contained in the paper would be required 
regarding the manner of computing the 
error rate, the exact configuration of the 
screen display, and the choice of cursor 
velocity under repeat-key action. 
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3.2.4 Comparison of Key Enhy and Menu 
Selection 

In a recent series of experiments with 32 
subjects, Fields and her colleagues studied 
four methods of tactical data input simu- 
lating battlefield information requirements 
[Fiel78]. The four methods, all of which 
appear directly applicable to program and 
text editing, were (1) typing English words 
or codes; (2) typing with an autocorrection 
feature that attempted to correct transpo- 
sition and single-character deletion, inser- 
tion, and substitution errors; (3) typing with 
an autocompletion feature that automati- 
cally completed nonambiguous entries and 
submitted them for verification to the op- 
erator; and (4) menu selection using a track 
ball (a stationary ball sunk in the console 
which can be rotated in any direction with 
the palm of the hand). 

The lowest error rate was obtained 
through menu selection (Table 10). Only 
minor differences were obtained in entry 
rates, perhaps because the subjects were 
improving rapidly throughout the experi- 
ment. Autocompletion provided the fastest, 
as well as the least liked and most error- 
prone, means of data entry. The authors 
recommend the incorporation of menu se- 
lection schemes in tactical message com- 
munications systems, but urge replacement 
of the track ball with a light pen, touch 
panel, or typed code. They also recommend 
retention of the typing option for applica- 
tions involving experienced operators and 
very long menus. They are not convinced 
of the value of automatic spelling correc tion 
in this application and believe that the au- 
tocompletion method is too confusing for 
novice operators. The experiment also pro- 
vides incidental support for the importance 
of split screens for editing operations (al- 
though in this instance two separate display 
terminals were used for each operator). 

4. CONCLUSIONS 

We have reviewed human-factors-oriented 
studies applicable to interactive text edi- 
tors. In this section we first review behav- 
ioral aspects of text editors that have been 
explored only superficially or not at all and 
state our sense of important outstanding 
design issues. We then summarize progress 


TAlJl.K 10. COMPARISON OF DaIA Kn’IRY 
Tk< IINHJLK.S 


Mean Men, 

Numtnr Tun 

of 

Method Errors ond. 

per pet 

Mes ■ Me. 

sage sag, 

Menus 2.6-1 39 V 

Typing with error correct ions 3.36 39 < 

Typing 3.77 39t» 

Typing with autocompletion 4.39 413 


in the areas that have been invest igat. 
with some degree of thoroughness. 

4.1 Outstanding Design Issues 

The amount of text, program code, at 
data accessible through interactive systei 
(this is the age of the information utilii 
the integrated office, the on-line manag 
ment information system) is increasing ra 
idly, as is the number of people require 
daily or sporadic access to such inform 
tion. Nevertheless we found no studies a 
dressing the human factors aspects of n 
use of editors for searching, inspecting, 
maintaining massive file systems; most 
the activity centers instead on symbol m 
nipulation in tasks of very limited scoi 
where really significant improvement ov 
the best of the present-day editors appea 
unlikely. We believe that as the boundari 
between interactive editors, database que 
languages, and nonprocedural prograi 
tiling systems continue to fade, rich retun 
can be expected from research on the mo 
appropriate conceptual organization of 1 1 
vast amounts of information whose value 
limited only by the user’s patience and ab 
ity to extract the needed portions. 

With regard to editors, the question c; 
be posed in terms of the underlying mod 
of information structure. For example, wh 
are the behavioral advantages anti disa. 
vantages of editors incorporating nesti 
subsections and structural informant 
about the semantic content of files (“h 
pertext”) as compared with convention 
“flat-text” editors? A major unresolved i 
sue is what constitutes the most conveniei 
means of allowing the user combined acce 
to parts of several files, or even of the san 

Computing Surveys, Vol. 13, No. 1, March 19 


; ;«aa 


64 • D. W. E mb ley and G. Nagy 

file. Because file structures tend to vary 
among editors far more than commands 
applicable only within a file, even expert 
users may experience difficulties with file 
manipulation. 

A related, but perhaps less important, 
problem is the evaluation of techniques for 
selecting small segments of text from a lim- 
ited corpus (i.e., file). Competing methods 
include unique addresses (e.g., line num- 
bers), content addressing (search pattern 
matching), and positional selection (i.e., 
pointing with cursor control or light pen). 
Questions to be answered include those of 
relative speed, error probability, and learn- 
ability of each technique. 

The most suitable form of editor com- 
mands remains a subject of contention. 
Most editors, like most programming lan- 
guages, tend to use prefix operators (print 
ml, m2), but some, including the popular 
UNIX editor, use postfix (ml, m2 print) or 
infix (ml, m2 move m3) notation. Com- 
mand abbreviations and appropriate de- 
fault options for operands also need more 
systematic investigation. 

While most users familiar with both 
types of editors express a preference for 
two-dimensional screen-oriented editors 
over one-dimensional hard-copy-oriented 
editors, we have not seen any experiments 
comparing them with respect to productiv- 
ity — although both varieties are available 
on the same CRT terminals under several 
systems, including CMS and UNIX. 

The matter of appropriate notation for 
the specification and analysis of editors, 
while touched upon here and there, has not 
been satisfactorily dealt with and needs 
further investigation. Perhaps some of the 
current work on the formal design and ver- 
ification of machine-to-machine communi- 
cations protocols will also prove useful in 
the design of human-computer communi- 
cations. 

Only modest steps have been taken to 
study the nature of error feedback and the 
benefits of automatic error detection and 
error correction features. 

Additional experimentation is needed on 
split-screen and multiple-screen editing op- 
erations. An important variable that does 
not appear to have been investigated is the 


screen size and the amount ol material ex- 
posed to t lie user (who may be disoriented . 
by frequent changes of screen). In current 
word-processing systems the amount of ma- 
terial displayed ranges all the way from half 
a line on a liquid crystal display to two 
entire pages of 60 or 70 lines. The most 
common display remains, however, the 
standard 24 by 80 CUT. 

A very recent idea is the application of 
an optical scanner to manuscript editing 
[SuenRO]. A facsimile device is used to 
enter the rough draft and an accompanying 
transparent overlay of proofreading marks, 
and to print the edited output. The system 
can process cursive handwriting, hand 
printing, typed copy, line drawings, and 
continuous-tone pictures. When the input 
is not machine readable, the system per- 
forms the indicated additions, deletions, 
and rearrangements, producing a “clean” 
version. With machine-readable input the 
characters are encoded by an optical char- 
acter recognition (OCR) subsystem in a 
form suitable for further editing using 
either the same system or a conventional 
editor. A graphics subsystem produces fin- 
ished versions of hand-drawn sketches, in- 
cluding lettering; photographs may be 
scaled and inserted in the text. The concept 
of a scanner-computer combination neces- 
sitating only pencil and paper instead of a 
keyboard terminal clearly opens an entirely 
new dimension for human factors research 
on “interactive” editing. 

Other technological advances that might 
enhance editing are color displays (and 
printers), audio input, and audio feedback. 
All of these are at the stage where serious 
consideration should be given to their ap- 
plication. Not at the same stage of readi- 
ness, but also within the realm of possibil- 
ity, is the direct use of eye movement for 
pointing and menu selection. 

Although text editors are generally con- 
sidered simpler than procedure-oriented 
high-level programming languages, they are 
less standardized. Practically every instal- 
lation boasts its own editor. It is our hope 
that as editor design gradually becomes 
more of an engineering-oriented discipline 
with a solid knowledge base buttressing 
design decisions, rational standardization 
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will lead toward improved portability of 
interactive editing skills. 

4.2 Summary of Results 

Here we review the findings that appear to 
us most likely to inthumce the design and 
evaluation of interactive text editors. 

• Keystroke models can predict task 
time for expert users performing routine 
tasks with an accuracy comparable to in- 
dividual variations between subjects. 'Phis 
accuracy is achieved, however, under the 
simplifying assumption that users perform 
editing tasks perfectly — without error — and 
thus represents an upper bound on how 
well an expert user might perform (Section 
1 . 1 ). 

• For an expert individual user the 
choice among alternative methods for per- 
forming routine editing tasks can be pre- 
dicted with reasonable accuracy by means 
of a few simple selection rules as suggested 
by the GOMS model. Unless the sequence 
of commands used to accomplish an editing 
task can be predicted nearly perfectly, how- 
ever, the level of detail at which editing is 
analyzed appears to be of little consequence 
(Section 1.2). 

• Observations of subjects performing 
editing tasks in a controlled experimental 
environment reveal that error detection 
and correction, suboptimal choice of editing 
methods, and unpredictable mental activi- 
ties account for between 25 and 50 percent 
of the task time. About as much difference 
between expert users can be attributed to 
user variability (mostly in error rate) as can 
be attributed to differences between editors 
(Section 1.3). 

• Flexibility and options tend to increase 
the rate at which expert users can accom- 
plish editing tasks but tend to reduce the 
rate for beginners (Section 2.2). 

• Editor surface syntax, such as familiar, 
descriptive, English-like phrases versus the 
arcane notation found in some command 
languages, can increase editing efficiency 
and reduce task completion time and error 
rate. Data from one experiment indicate 
that these effects are more pronounced for 
users without previous experience with ed- 
itors (Section 2.4). 


• The various studies of editors rungt 
from timed tests to formal analysis of slat 
and to the effects of surface syntax itulicu 
however, that there is less than a 2 : 1 tl 
ferenee between editors in common use i 
day. Most users can perform about equa 
well on any reasonable editor (Section: 
and 2). 

• Ambient conditions for professioi 
keyboard operators, including the effects 
temperature, noise, work station layout, 
lumination, and work-rest cycles, ha 
been extensively studied and the resit 
probably apply to editing as well (Seed 
3). 

• The ergonomic aspects of keyboa 
layout and design are well understood. Ki 
ing rates cannot be significantly itnprm 
without specialized and lengthy trainii 
but there are established training rules i 
developing typing speeds of the order oft 
second per keystroke on short tasks, lb 
ductivity for an eight-hour day is about h 
that predicted from peak rates. Variatio 
in keying speed and error rate are predii 
able as a function of input material as 
speed/error trade-offs as a result of t 
reward structure. Individual variabib 
among expert keyboard operators is l 
greater with regard to error rate than 
speed. Although some consider type-ahe 
capability important for interactive edit it 
it increases the error rate for text and da 
entry (Section 3.1). 

• Considerable data have also been ; 
cumulated on the design of display tern 
nals, and appropriate values have been . 
tablished for display contrast, color, ch. 
acter size, character shape, refresh rate, a: 
screen orientation. Results on display-h 
mat design are less definitive, but it appet. 
that menu selection is less error prone (b 
slower) than direct item entry. The emph 
sis on appropriate formatting of display, 
values indicates that human factors consi 
eration can be ignored only at the pe 
of considerable performance degradali. 
(Section 3.2). 

• Among display selection mechanist 
the mouse appears to be nearly optimal 
terms of pointing skill and accuracy, but 
performance is only marginally superior 
that of several other commonly accept 
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cursor control mechanisms. For some tasks 
keyboard control of the cursor should be 
retained (Section 3.2). 

As is happily always the case, much more 
remains to be done than has already been 
accomplished. Text editors represent the 
principal interface with computers for 
many people and therefore deserve a con- 
certed effort toward applying psychological 
and human factors concepts and methods 
to increase their usability. Consumerism in 
the field of computers is here to stay. 
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Modes of human-computer interaction in the control of dynamic systems aie discusM-d, 
and the problem ol allocating tasks between human and computer considered. Models ol 
human performance in a variety of tasks associated with the control of dynamic systems 
are reviewed. These models are evaluated in the context of a design example involving 
human-computer interaction in aircraft operations. Other examples include power plants, 
chemical plants, and ships. 
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INTRODUCTION 

Large dynamic systems such as nuclear 
power plants, chemical plants, ships, and 
aircraft are becoming increasingly complex, 
and, as a consequence, humans and com- 
puters must interact to control these sys- 
tems. There are several motivations for the 
increased use of computers. First, there is 
a desire for improved performance so that, 
for example, airports can service more air- 
craft per unit time or chemical plants can 
operate with greater efficiency. Safety is 
also a motivation for using computers. A 
good illustration is efforts directed toward 
developing computer-based alarm systems. 
Finally, economic considerations are moti- 
vating the increased use of computers. Im- 
proved efficiency in terms of energy con- 
sumption is a particularly important eco- 
nomic motivation. Further, computer tech- 
nology is becoming less and less expensive 
and consequently is being incorporated into 
system design in order to reduce costs. 

The fact that computers are increasingly 
being introduced into these environments 
does not imply that the human is being 


replaced. Instead, the human’s role 
changing. The human who used to he i 
sponsible for direct control of t hese syslei 
is now becoming more of a monitor u: 
supervisor of the computers that do t 
direct controlling [Sher76]. Although tl 
trend is quite clear, it is still not particulai 
easy to determine appropriate ullocutio 
of tasks between humans and coniput > 
and to devise suitable modes of huma 
computer communication. The purpose 
this paper is to provide a survey of alt< 
native approaches for resolving these 
sues. 

Dynamic Systems 

One particular attribute that all of t 
above examples share is that they are < 
namic systems. A system is dynamic to t 
extent that its future outputs depend on 
past outputs, as well as on inputs general 
by humans, computers, and the enviro 
ment. Because the outputs of dynamic s\ 
terns depend on more than just the systei i 
inputs, they continue to evolve regardlc 
of whether or not any control is exercisi 
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